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i (54t Title: ZINC FINGER PROTEIN DERIVATIVES AND METHODS THEREFOR 

i 
i 

| (57) Abstract 

i 

j Zinc finger proteins of the Cys2His2 type represent a class of malleable DNA binding proteins which may be selected to bind diverse 

| sequences. Typically, zinc finger proteins containing three zinc finger domains, like the murine transcription factor Zif268 and the human 
; transcription factort Spl, bind nine contiguous base pairs (bp). To create a class of proteins which would be generally applicable to 
j target unique sites within complex genomes, the present invention provides a polypeptide linker that fuses rwo thrce-hnger proteins. Two 
six-fingered proteins were created and demonstrated to bind 18 contiguous bp of DNA in a sequence specific fashion. Expression of these 
proteins as fusions to activation or repression domains allows transcription to be specifically up or down modulated within cells. Polydactyl 
zinc hnger proteins are broadly applicable as genome-specific transcriptional switchhes in gene therapy strategies and the development 
of novel transgenic plants and animals. Such proteins are useful for inhibiting, activating or enhancing gene expression from a zinc 
ftnger-nucleotide binding motif containing promoter or other transcriptional control element, as well as a structural gene or RNA sequence. 
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ZINC FINGER PROTEIN DERIVATIVES AND METHODS THEREFOR 
BACKGROUND OF THE INVENTION 
1 . Field of the Invention 

This invention relates generally to the field of regulation of gene expression and 
5 specifically to methods of modulating gene expression by utilizing polypeptides derived 
from zinc finger-nucleotide binding proteins. 

2 Description of Related A rt 

Transcriptional regulation is primarily achieved by the sequence-specific binding of 
proteins to DNA and RNA. Of the known protein motifs involved in the sequence 

1 0 specific recognition of DNA, the zinc finger protein is unique in its modular nature. To 
dale, zinc finger proteins have been identified which contain between 2 and 37 modules. 
M ore than two hundred proteins, many of them transcription factors, have been shown 
to possess zinc fingers domains. Zinc fingers connect transcription factors to their target 
genes mainly by binding to specific sequences of DNA base pairs - the "rungs" in the 

15 DNA "ladder". 

Zinc finger modules are approximately 30 amino acid-long motifs found in a wide 
variety of transcription regulatory proteins in eukaryotic organisms. As the name 
implies, this nucleic acid binding protein domain is folded around a zinc ion. The zinc 
finger domain was first recognized in the transcription factor TFHIA from Xenopus 
20 ooc vies (Miller, et a/., EMBO, 4:1609-1614, 1985; Brown, ex al. FEES Lett., 186:271- 
274. 1 985). This protein consists of nine imperfect repeats of a consensus sequence: 

(Tyr, Phe)-X-Cys-X^-Cys-X r Phe-X,-Leu-X 2 -His-X 3U -His-X^ (SEQ ID 
NO: 1) 
where X is any amino acid. 
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Like TFIIIA, most zinc finger proteins have conserved cysteine and histidine residues 
that tetrahedrally-coordinate the single zinc atom in each finger domain. The structure 
of individual zinc finger peptides of this type (containing rwo cysteines and two 
histidines) such as those found in the yeast protein ADR1, the human male associated 
5 protein ZFY, the HIV enhancer protein and the Xenopus protein Xfin have been solved 
by high resolution NMR methods (Kochoyan, et al , Biochemistry, 30:3371-3386, 1991; 
Omichinski, etal, Biochemistry, 22:9324-9334, 1990; Lee, et al, Science, 245:635-637. 
1989) and detailed models for the interaction of zinc fingers and DNA have been 
proposed (Berg, 1988; Berg, 1990; Churchill, et al., 1990), Moreover, the structure of 

10 a three finger poiypeptide-DNA complex derived from the mouse immediate early 
protein zif268 (also known as Krox-24) has been solved by x-ray crystallography 
(Pavletich and Pabo, Science, 252 :809-817, 1991). Each finger contains an antiparaJlel 
p-turn, a finger tip region and a short amphipathic a-helix which, in the case of zif268 
zinc fingers, binds in the major groove of DNA. In addition, the conserved hydrophobic 

15 amino acids and zinc coordination by the cysteine and histidine residues stabilize the 
structure of the individual finger domain. 

While the prototype zinc finger protein TFIIIA contains an array of nine zinc fingers 
which binds a 43 bp sequence within the 5S RNA genes, regulatory proteins of the zif268 
class (Krox-20, Spl, for example) contain only three zinc fingers within a much larger 

20 polypeptide. The three zinc fingers of zif268 each recognize a 3 bp subsite within a 9 bp 
recognition sequence. Most of the DNA contacts made by zif268 are with phosphates 
and with guanine residues on one DNA strand in the major groove of the DNA helix. In 
contrast, the mechanism of TFIIIA binding to DNA is more complex. The amino- 
tenninal 3 zinc fingers recognize a 13 bp sequence and bind in the major groove. Similar 

25 to zif268, these fingers also make guanine contacts primarily on one strand of the DNA. 
Unlike the zi£268 class of proteins, zinc fingers 4 and 6 of TFIIIA each bind either in or 
across the minor groove, bringing fingers 5 and 7 through 9 back into contact with the 
major groove (Clemens, et al, Proc. Natl Acad. Sci. USA, 89:10822-10826, 1992). 



WO 98/54311 PCT7LS98/10801 

-3- 

The crystal structure of zif268, indicates that specific histidine (non-zinc coordinating 
his residues) arid arginine residues on the surface of the a-helix participate in DNA 
recognition. Specifically, the charged amino acids immediately preceding the a-helix 
and at helix positions 2, 3, and 6 (immediately preceding the conserved histidine) 
5 participate in hydrogen bonding to DNA guanines. Similar to finger 2 of the regulatory 
protein Krox-20 and fingers 1 and 3 of Spl, finger 2 of TFIEA contains histidine and 
arginine residues at these DNA contact positions; further, each of these zinc fingers 
minimally recognizes the sequence GGG. Finger swap experiments between 
transcription factor Spl and Krcx 20 have confirmed the 3-bp zinc finger recognition 

10 code for this class of finger proteins (Nardelli, et ai, Nature, 349:175-178, 1989). 
Mutagenesis experiments have also shown the importance of these amino acids in 
specifying DNA recognition. It would be desirable to ascertain a simple code which 
specifies zinc finger-nucleotide recognition. If such a code could be deciphered, then 
zinc finger polypeptides might be designed to bind any chosen DNA sequence. The 

1 5 complex of such a polypeptide and its recognition sequence might be utilized to modulate 
(up or down) the transcriptional activity of the gene containing this sequence. 

Zinc finger proteins have also been reported which bind to RNA. Clemens, et ai % 
(Science, 260:530, 1993) found that fingers 4 to 7 of TFIUA contribute 95% of the free 
energy of TFIUA binding to 5S rRNA, whereas fingers 1 to 3 make a similar contribution 
20 in binding the promoter of the 5S gene. Comparison of the two known 5S RNA binding 
proteins, TFIUA and p43, reveals few homologies other than the consensus zinc ligands 
(C and H), hydrophobic amino acids and a threonine -tryptophan-ihreonirie triplet motif 
in finger 6. 

In order to redesign zinc fingers, new selective strategies must be developed and 
25 additional information on the structural basis of sequence-specific nucleotide recognition 
is required. Current protein engineering efforts utilize design strategies based on 
sequence and/or structural analogy. While such a strategy may be sufficient for the 
transfer of motifs, it limits the ability to produce novel nucleotide binding motifs not 
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known in nature. Indeed, the redesign of zinc fingers utilizing an analogy based strategy 
has met with only modest success (Desjarlais and Berg, Proteins, 12: 1 0 1 , 1992). 

As a consequence, there exists a need for new strategies for designing additional zinc 
fingers with specific recognition sites as well as novel zinc fingers for enhancing or 
5 repressing gene expression. 
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SUMMARY OF THE INVENTION 

The invention provides an isolated zinc finger-nucleotide binding polypeptide variant 
comprising at least two zinc finger modules that bind to a cellular nucleotide sequence 
and modulate the function of the cellular nucleotide sequence. The variant binds to 
5 either DNA or RNA and may enhance or suppress transcription from a promoter or from 
within a transcribed region of a structural gene. The cellular nucleotide sequence may 
be a sequence which is a naturally occurring sequence in the cell, or it may be a viral- 

uviirwu IJU^lVUUUt DtLjUCU^C 111 U1C LCU, 

In another embodiment the invention provides a pharmaceutical composition comprising 
0 a therapeutically effective amount of a zinc finger-nucleotide binding polypeptide 
derivative or a therapeutically effective amount of a nucleotide sequence which encodes 
a zinc fmger-nucleotide binding polypeptide derivative, wherein the derivative binds to 
a cellular nucleotide sequence to modulate the function of the cellular nucleotide 
sequence, in combination with a pharmaceutically acceptable earner. 

5 In a further embodiment, the invention provides a method for inhibiting a cellular 
nucleotide sequence comprising a zinc finger-nucleotide binding motif, the method 
comprising contacting the motif with a zinc finger-nucleotide binding polypeptide 
derivative which binds the motif. 

In yet a further embodiment, the invention provides a method for obtaining an isolated 
0 zinc finger-nucleotide binding polypeptide variant which binds to a cellular nucleotide 
sequence comprising identifying the ammo acids in a zinc finger-nucleotide binding 
polypeptide that bind to a first cellular nucleotide sequence and modulate the function 
of the nucleotide sequence; creating an expression library encoding the polypeptide 
variant containing randomized substitution of the amino acids identified; expressing the 
5 library in a suitable host cell; and isolating a clone that produces a polypeptide variant 
that binds to a second cellular nucleotide sequence and modulates the function of the 
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second nucleotide sequence. Preferably, the expression library encoding the polypeptide 
variant is a phage display library. 

The invention also provides a method of treating a subject with a cell proliferative 
disorder, wherein the disorder is associated with the modulation of gene expression 
5 associated with a zinc finger-nucleotide binding motif, comprising contacting the zinc 
finger-nucleotide binding motif with an effective amount of a zinc finger-nucleotide 
binding polypeptide derivative that binds to the zinc finger-nucleotide binding motif to 
modulate activity of the gene. 

Further, the invention provides a method for identifying a protein which modulates the 
1 0 function of a cellular nucleotide sequence and binds to a zinc finger-nucleotide binding 
motif comprising incubating components comprising a nucleotide sequence encoding the 
putative modulating protein operably linked to a first inducible promoter, and a reporter 
gene operably linked to a second inducible promoter and a zinc finger-nucleotide bindmg 
motif, wherein the incubating is carried out under conditions sufficient to allow the 
1 5 components to interact; and measuring the effect of the putative modulating protein on 
the expression of the reporter gene. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

FIGURE I shows a model for the interaction of the zinc fingers of TFIHA with the 
internal promoter of the 5 S RNA gene. 

FIGURE 2A shows the amino acid sequence of the first three amino terminal zinc fingers 
5 of TFIIIA. 

FIGURE 2B sho ws the nutieoiide sequence of the minimal binding site for zf 1-3. 

FIGURE 3 shows a gel mobility shift assay for the binding of zfl-3 to a 23 bp 32 P-labeled 
double stranded oligonucleotide. 

FIGURE 4 shows an autoradiogram of in vitro transcription indicating that zfl-3 blocks 
1 0 transcription by T7 RNA polymerase. 

FIGURE 5 shows binding of zfl-3 to its recognition sequence blocks transcription from 
a T7RNA polymerase promoter located nearby. A plot of percent of DNA molecules 
bound by zfl -3 in a gel mobility shift assay (x-axis) is plotted against percent inhibition 
of T7RNA polymerase transcription (y-axis). 

15 FIGURE 6 is an autoradiogram showing zfl-3 blocks eukaryotic RNA polymerase III 
transcription in an in vitro transcription system derived from unfertilized Xenopus eggs. 

FIGURE 7 shows the nucleotide and deduced ammo acid sequence for the zinc fingers 
of zif268 which were cloned in pComb 3.5. 
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FIGURE 8 shows the amino acid sequence of the Zif268 protein and the hairpin DNA 
used for phage selection. (A) shows the conserved features of each zinc finger. (B) 
shows the hairpin DNA containing the 9-bp consensus binding site. 

FIGURE 9 is a table listing of the six randomized residues of finger 1,2, and 3. 

5 FIGURE 1 0 shows an SDS-PAGE of Zif268 variant A 14 before IPTG induction (lane 
2); after IPTG induction (lane 3); cytoplasmic fraction after removal of inclusion bodies 
(lane 4); inclusion bodies containing zinc finger peptide (lane 5); and mutant Zi£268 
(lane 6). Lane 1 is MW Standards (TcD). 

FIGURE 1 1 is a table indicating association rate; fa , dissociation rate; and d K 
10 equilibrium dissociation constant, for each protein. 

FIGURE 1 2 shows dissociation rate (k o(f ) of wild-type Zif268 protein (WT) (□) and its 
variant C7 (o). by real-time changes in surface plasmon resonance. 

FIGURE 1 3 A and B show the nucleotide and amino acid sequence of Zif268-Jun (SEQ 
ID NOS: 33 and 34). 

1 5 FIGURE 14A and B show the nucleotide and amino acid sequence of Zif268-Fos (SEQ 
ID NOS: 35 and 36). 

FIGURE 15 shows the nucleotide and amino acid sequence of the three finger 
construction of C7 zinc finger (SEQ ID NOS: 41 and 42). 

FIGURE 16A and B show the nucleotide and amino acid sequence of Zif268-Zi£268 
20 linked by a TGEKP linker (SEQ ID NOS : 43 and 44). 
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FIGURE 17 shows gel shift reactions. FIGURE 17A shows binding of the maltose 
binding protein fusions (MBP)-C7-C7 and MBP-SpIC-C7 with duplex DNA 
oligonucleotides containing various target sequences. (A) MBP-C7-C7 protein was used 
to shift the double-stranded DNA probes containing the target sequences listed on top of 

5 each panel (from left to right; C7-C7 site, SplC-C7 site, C7 site, and (GCG) 6 site). The 
protein concentration is given in nM beneath each lane with a 2-fold serial dilution from 
left to right in each panel. FIGURE 17B shows MBP-SP1C-C7 protein titrated into gel 
shift reactions with probes containing target sequences (from left to right; SplC-C7 site, 
C7-C 7 site. C7 site, and SplC site) as listed on top of each panel. The protein 

0 concentration is labeled in nM beneath each lane, with a 2-fold serial dilution from left 
to right in each panel. 

FIGURE 18 is a DNasel footprint of MBP-C7-C7 and MBP-SplC-C7. A 220 bp 
radiolabeled fragment containing the binding site for MBP-C7-C7 (lanes 1-3) or MBP- 
SplC-C7 (lanes 4-6) was incubated with either 20 ug/ml of BSA (lanes 2 and 4) or the 
5 cognate binding protein (300 nM, lanes 3 and 6) in lx Binding Buffer for 30 min. 
DNasel footpnnting was then performed using the SureTrack Footpnnting Kit 
(Pharmacia) according to the manufacturer's instructions. Boxed region indicates the 
binding site sequence. Asterisk indicates the 3'-labeled strand. Lanes 1 and 4: G+A 
ladders. 

0 FIGURE 19 shows transcriptional regulation mediated by six-finger proteins in living 
cells. FIGURE 19A: HeLa cells were transiently transfected in triplicate with 2.5 ug of 
the indicated reporter plasmids and 2.5 ug C7-C7-VP1 6 expression plasmid. Luciferase 
activities were measured 48 h later, and normalized to the control P-galactosidase 
activity. The relative light units are given on top of each column with standard deviations 

5 with an error bar. FIGURE 19B: HeLa ceils were transfected with 2.5 ug of the indicated 
reporter plasmids and either no C7-C7-KRAB expression or 1 ug of the C7-C7-KRAB 
expression plasmid by using LipofectAmme (Gibco-BRL) as the transfection reagent. 
Luciferase activities were measured 48 h later, and normalized to the control 0- 
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galactosidase activity. The relative light unit values were labeled on top of each column, 
with standard deviation as error bar. 
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DETAILED DESCRIPTION OF THE INVENTION 

The present invention provides an isolated zinc finger-nucleotide binding polypeptide 
variant comprising at least two zinc finger modules that bind to a cellular nucleotide 
sequence and modulate the funcuon of the cellular nucleotide sequence. The polypeptide 

5 variant may enhance or suppress transcription of a gene, and may bind to DN A or RNA. 
In addition, the invention provides a pharmaceutical composition comprising a 
therapeutically effective amount of a zinc finger-nucleotide binding polypeptide 
derivative cr a therapeutically effective amount of a nucleotide sequence that encodes a 
zinc finger-nucleotide binding polypeptide derivative, wherein the derivative binds to a 

0 cellular nucleotide sequence to modulate the function of the cellular nucleotide sequence, 
in combinauon with a pharmaceutically acceptable carrier. The invention also provides 
a screening method for obtaining a zinc finger-nucleotide binding polypeptide variant 
which binds to a cellular or viral nucleotide sequence. 

A zinc finger-nucleotide binding polypeptide "variant" or "derivative " refers to a 
5 polypeptide which is a mutagenized form of a zinc Gnger protein or one produced 
through recombination. A variant may be a hybrid which contains zinc finger domain(s) 
from one protein linked to zinc finger domain(s) of a second protein, for example. The 
domains may be wild type or mutagenized. A "variant " or "derivative" includes a 
truncated form of a wild type zinc finger protein, which contains less than the original 
0 number of fingers in the wild type protein. Examples of zinc finger-nucleotide binding 
polypeptides from which a derivative or variant may be produced include TFH1A and 
zif268. 

As used herein a "zinc finger-nucleotide binding motif refers to any two or three- 
dimensional feature of a nucleotide segment to which a zinc finger-nucleotide binding 
5 derivative polypeptide binds with specificity. Included within this definition are 
nucleotide sequences, generally of five nucleotides or less, as well as the three 
dimensional aspects of the DNA double helix, such as the major and minor Grooves, the 
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face of the helix, and the like. The motif is typically any sequence of suitable length to 
which the zinc finger polypeptide can bind. For example, a three finger polypeptide 
binds to a motif typical iy having about 9 to about 14 base pairs. Preferably, the 
recognition sequence will be at least about 16 base pairs to ensure specificity within the 
5 genome. Therefore, the invention provides zinc finger-nucleotide binding polypeptides 
of any specificity, and the zinc finger binding motif can be any sequence designed by the 
experiment or to which the zinc finger protein binds. The motif may be found in any 
DNA or RNA sequence, including regulatory sequences, exons, introns. or any non- 
coding sequence. 

10 In the practice of this invention it is not necessary that the zinc finger-nucleotide binding 
motif be known in order to obtain a zinc-finger nucleotide binding variant polypeptide. 
Although zinc finger proteins have so far been identified only in eukaryotes, it is 
specifically contemplated within the scope of this invention that zinc finger-nucleotide 
binding motifs can be identified in non-eukaryotic DNA or RNA, especially in the native 

15 promoters of bactena and viruses by the binding thereto of the genetically modified 
isolated constructs of this invention that preserve die well known structural 
characteristics of the zinc finger, but differ from zinc finger proteins found in nature by 
their method of production, as well as their amino acid sequences and three-dimensional 
structures. 

20 The characteristic structure of the known wild type zinc finger proteins are made up of 
from two to as many as 37 modular tandem repeats, with each repeat forming a "finger" 
holding a zinc atom in tetrahedral coordination by means of a pair of conserved cysteines 
and a pair of conserved histidines. Generally each finger also contains conserved 
hydrophobic amino acids that interact to form a hydrophobic core that helps the module 

25 maintain its shape. 

The zinc finger-nucleotide binding polypeptide variant of the invention comprises at least 
two and preferably at least about four zinc finger modules that bind to a cellular 
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nucleotide sequence and modulate the function of the cellular nucleotide sequence. The 
term "cellular nucleotide sequence" refers to a nucleotide sequence which is present 
within the cell. It is not necessary that the sequence be a naturally occurring sequence 
of the cell For example, a retroviral genome which is integrated within a host's cellular 
5 DNA, would be considered a "cellular nucleotide sequence". The cellular nucleotide 
sequence can be DNA or RNA and includes both introns and exons, DNA and RNA. 
The cell and/or cellular nucleotide sequence can be prokaryotic or eukaryotic, including 
a yeast, virus, or plant nucleotide sequence. 

The term "modulate" refers to the suppression, enhancement or induction of a function. 

1 0 For example, the zinc finger-nucleotide binding polypeptide variant of the invention may 
modulate a promoter sequence by binding to a motif within the promoter, thereby 
enhancing or suppressing transcription of a gene operatively linked to the promoter 
cellular nucleotide sequence. Alternatively, modulation may include inhibition of 
transcription of a gene where the zinc finger-nucleotide binding polypeptide variant 

1 5 binds to the structural gene and blocks DNA dependent RNA polymerase from reading 
through the gene, thus inhibiting transcription of the gene. The structural gene may be 
a normal cellular gene or an oncogene, for example. Alternatively, modulation may 
include inhibition of translation of a transcript. 

The promoter region of a gene includes the regulatory elements that typically lie 5' to a 
20 structural gene. If a gene is to be activated, proteins known as transcription factors attach 
to the promoter region of the gene. This assembly resembles an "on switch" by enabling 
an enzyme to transcribe a second genetic segment from DNA into RNA. In most cases 
the resulting RNA molecule serves as a template for synthesis of a specific protein, 
sometimes RNA itself is the final product. 

25 The promoter region may be a normal cellular promoter or, for example, an onco- 
promoter. An onco-promoter is generally a virus-derived promoter. For example, the 
long terminal repeat fLTR> of retroviruses is a promoter reeion which mm- he ? rr>re? f 
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for a zinc finger binding polypeptide variant of the invention. Promoters from members 
of the Lentivirus group, which include such pathogens as human T-cell lymphotrophic 
virus (HTLV) 1 and 2, or human immune deficiency virus (HIV) 1 or 2, are examples of 
viral promoter regions which may be targeted for transcriptional modulation by a zinc 
5 finger binding polypeptide of the invention. 

The zinc finger-nucleotide binding polypeptide derivatives or variants of the invention 
include polypeptides that bind to a cellular nucleotide sequence such as DNA, RNA or 
both. A zinc finger-nucleotide binding polypeptide which binds to DNA, and 
specifically, the zinc finger domains which bind to DNA, can be readily identified by 
1 0 examination of the "linker" region between two zinc finger domains. The linker amino 
acid sequence TGEK(P) (SEQ ID NO: 32) is typically indicative of zinc finger domains 
which bind to a DNA sequence. Therefore, one can determine whether a particular zinc 
finger-nucleotide binding polypeptide preferably binds to DNA or RNA by examination 
of the linker amino acids. 

15 In one embodiment, a method of the invention includes a method for inhibiting or 
suppressing the function of a cellular nucleotide sequence comprising a zinc finger- 
nucleotide binding motif which comprises contacting the zinc finger-nucleotide binding 
motif with an effective amount of a zinc finger-nucleotide binding polypeptide derivative 
that binds to the motif. In the case where the cellular nucleotide sequence is a promoter, 

20 the method includes inhibiting the transcriptional transactivation of a promoter 
containing a zinc finger-DNA binding motif. The term "inhibiting" refers to the 
suppression of the level of activation of transcription of a structural gene operably linked 
to a promoter containing a zinc finger-nucleotide binding motif, for example. In 
addition, the zinc finger-nucleotide binding polypeptide derivative may bind a motif 

25 within a structural gene or within an RNA sequence. 

The term "effective amount" includes that amount which results in the deactivation of a 
previously activated promoter or that amount which results in the inacuvation of a 
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promoter containing a zinc finger-nucleotide binding motif, or that amount which blocks 
transcription of a structural gene or translation of RNA. The amount of zinc finger 
derived-nucleotide binding polypeptide required is that amount necessary to either 
displace a native zinc finger-nucleotide binding protein in an existing protein/ promoter 
5 complex, or that amount necessary to compete with the native zinc finger-nucleotide 
binding protein to form a complex with the promoter itself. Similarly, the amount 
required to block a structural gene or RNA is that amount which binds to and blocks 
RNA polymerase from reading through on the gene or that amount which inhibits 
translation, respectively. Preferably, the method is performed intracellularly. By 
10 functionally inactivating a promoter or structural gene, transcription or translation is 
suppressed. Delivery of an effective amount of the inhibitory protein for binding to or 
"contacting" the cellular nucleotide sequence containing the zinc finger-nucleotide 
binding protein motif, can be accomplished by one of the mechanisms described herein, 
such as by retroviral vectors or liposomes, or other methods well known in the art. 

1 5 The zinc finger-nucleotide binding polypeptide derivative is derived or produced from 
a wild type zinc finger protein by truncation or expansion, or as a variant of the wild 
type-derived polypeptide by a process of site directed mutagenesis, or by a combination 
of the procedures. 

The term "truncated" refers to a zinc finger-nucleotide binding polypeptide derivative 
20 that contains less than the full number of zinc fingers found in the native zinc finger 
binding protein or that has been deleted of non-desired sequences. For example, 
truncation of the zinc finger-nucleotide binding protein TFTTf A t which naturally contains 
nine zinc fingers, might be a polypeptide with only zinc fingers one through three 
Expansion refers to a zinc finger polypeptide to which additional zinc finger modules 
25 have been added. For example, TF1HA may be extended to 12 fingers by adding 3 zinc 
fin ger domains. In addition, a truncated zinc finger-nucleotide binding polypeptide may 
include zinc finger modules from more than one wild type polypeptide, thus resulting in 
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The term "mutagenized" refers to a zinc finger derived-nucleotide binding polypeptide 
that has been obtained by performing any of the known methods for accomplishing 
random or site-directed mutagenesis of the DNA encoding the protein. For instance, in 
TFuTA, mutagenesis can be performed to replace nonconserved residues in one or more 
5 of the repeats of the consensus sequence. Truncated zinc finger-nucleotide binding 
proteins can also be mutagenized. 

Examples of known zinc finger-nucleotide binding proteins that can be truncated, 
expanded, and/or mutagenized according to the present invention in order to inhibit the 
function of a cellular sequence containing a zinc 5nger-nucleotide binding motif includes 
10 TFULA and zif268. Other zinc finger-nucleotide binding proteins will be known to those 
of skill in the art. 

The invention also provides a pharmaceutical composition comprising a therapeutically 
effective amount of a zinc finger-nucleotide binding polypeptide derivative or a 
therapeutically effective amount of a nucleotide sequence which encodes a zinc finger- 
15 nucleotide binding polypeptide derivative, wherein the derivative binds to a cellular 
nucleotide sequence to modulate the function of the cellular nucleotide sequence, in 
combination with a pharmaceutical^ acceptable carrier. Pharmaceutical compositions 
containing one or more of the different zinc finger-nucleotide binding derivatives 
described herein are useful in the therapeutic methods of the invention. 

20 As used herein, the terms "pharmaceutical ly acceptable", "physiologically tolerable" and 
grammatical variations thereof, as they refer to compositions, carriers, diluents and 
reagents, are used interchangeably and represent that the materials are capable of 
administration to or upon a human without the production of undesirable physiological 
effects such as nausea, dizziness, gastric upset and the like which would be to a degree 

25 that would prohibit administration of the composition. 
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The preparation of a pharmacological composition that contains active ingredients 
dissolved or dispersed therein is well understood in the art. Typically such compositions 
are prepared as sterile injectables either as liquid solutions or suspensions, aqueous or 
non-aqueous, however, solid forms suitable for solution, or suspensions, in liquid prior 
5 to use can also be prepared. The preparation can also be emulsified 

The active ingredient can be mixed with excipients which are pharmaceutically 
acceptable and compatible with the active ingredient and in amounts suitable for use in 
the therapeutic methods described herein. Suitable excipients are, for example, water, 
saline, dextrose, glycerol, ethanol or the like and combinations thereof. In addition, if 
0 desired, the composition can contain minor amounts of auxiliary substances such as 
wetting or emulsifying agents, as well as pH buffering agents and the like which enhance 
the effectiveness of the active ingredient. 

The therapeutic pharmaceutical composition of the present invention can include 
pharmaceutically acceptable salts of the components therein. Pharmaceutically 

5 acceptable salts include the acid addition salts (formed with the free amino groups of the 
polypeptide) that are formed with inorganic acids such as, for example, hydrochloric or 
phosphoric acids, or such organic acids as acetic, tartaric, mandelic and the like. Salts 
formed with the free carboxyl groups can also be derived from inorganic bases such as, 
for example, sodium, potassium, ammonium, calcium or ferric hydroxides, and such 

0 organic bases as isopropylamine, trimethylamine, 2-ethylammo ethanol, histidine, 
procaine and the like. 

Physiologically tolerable carriers are well known in the art. Exemplary of liquid carriers 
are stenle aqueous solutions that contain no materials in addition to the active ingredients 
and water, or contain a buffer such as sodium phosphate at physiological pH value, 
5 physiological saline or both, such as phosphate-buffered saline. Still further, aqueous 
earners can contain more than one buffer salt, as well as salts such as sodium and 
potassium chlorides dfvtr^?* ^rnr^yjppp ^vm 1 r^lv^+bvNn^ My^t i - .i m ip 
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Liquid compositions can also contain liquid phases in addition to and to the exclusion of 
water. Exemplary of such additional liquid phases are glycerin, vegetable oils such as 
cottonseed oil, organic esters such as ethy! oleate, and water-oil emulsions 

The invention includes a nucleotide sequence encoding a zinc finger-nucleotide binding. 

5 polypeptide variant. DNA sequences encoding the zinc finger-nucleotide binding 
polypeptides of the invention, including native, truncated, and expanded polypeptides, 
can be obtained by several methods. For example, the DNA can be isolated using 
hybridization procedures which are well known in the art. These include, but are not 
limited to: (1) hybridization of probes to genomic or cDNA libraries to detect shared 

0 nucleotide sequences; (2) antibody screening of expression libraries to detect shared 
structural features; and (3) synthesis by the polymerase chain reaction (PCR). RNA 
sequences of the invention can be obtained by methods known in the art (See for 
example. Current Protocols in Molecular Biology, Ausubel. at al. eds., 1989). 

The development of specific DNA sequences encoding zinc finger-nucleotide binding 
5 proteins of the invention can be obtained by: (1) isolation of a double-stranded DNA 
sequence from the genomic DNA; (2) chemical manufacture of a DNA sequence to 
provide the necessary codons for the polypeptide of interest; and (3) in vitro synthesis 
of a double-stranded DNA sequence by reverse transcription of mRNA isolated from a 
eukaryouc donor cell. In the latter case, a double-stranded DNA complement of mRNA 
0 is eventually formed which is generally referred to as cDNA. Of these three methods for 
developing specific DNA sequences for use in recombinant procedures, the isolation of 
genomic DNA is the least common. This is especially true when it is desirable to obtain 
the microbial expression of mammalian polypeptides due to the presence of introns. 

For obtaining zinc finger derived-DNA binding polypeptides, the synthesis of DNA 
:5 sequences is frequently the method of choice when the entire sequence of amino acid 
residues of the desired polypeptide product is known. When the entire sequence of 
amino acid residues of the desired polypeptide is not known, the direct synthesis of DNA 
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sequences is not possible and the method of choice is the formation of cDNA sequences. 
.Among the standard procedures for isolating cDNA sequences of interest is the formation 
of pi asm id -carrying cDNA libraries which are derived from reverse transcription of 
mRNA which is abundant in donor cells that have a high level of genetic expression. 
5 When used in combination with polymerase chain reaction technology, even rare 
expression products can be cloned. In those cases where significant portions of the 
amino acid sequence of the polypeptide are known, the production of labeled single or 
double-stranded DNA or RNA probe sequences duplicating a sequence putative ly present 
in the target cDNA may be employed in DNA/DNA hybridization procedures which are 
10 carried out on cloned copies of the cDNA which have been denatured into a single- 
stranded form (Jay, et ai, Nucleic Acid Research. _M:2325, 1983). 

Hybridization procedures are useful for the screening of recombinant clones by using 
labeled mixed synthetic oligonucleotide probes where each probe is potentially the 
complete complement of a specific DNA sequence in the hybridization sample which 

15 includes a heterogeneous mixture of denatured double-stranded DNA. For such 
screening, hybridization is preferably performed on either single -stranded DNA or 
denatured double -stranded DNA. Hybridization is particularly useful in the detection of 
cDNA clones derived from sources where an extremely low amount of mRNA sequences 
relating to the polypeptide of interest are present. By using stringent hybridization 

20 conditions directed to avoid non-specific binding, it is possible, for example, to allow the 
autoradiographic visualization of a specific cDNA clone by the hybridization of the 
target DNA to that single probe in the mixture which is its complete complement 
( Wallace, et ai. Nucleic Acid Research, 9:879, 1981; Maniatis, et al. Molecular 
Cloning A Laboratory- Manual, Cold Spring Harbor Laboratory, 1982). 

25 Screening procedures which rely on nucleic acid hybridization make it possible to isolate 
any gene sequence from any organism, provided the appropriate probe is available. 
Oligonucleotide probes, which correspond to a pan of the sequence encoding the protein 

in question, can be svnthesized chemicallv This reauire<; *hv shor -M^nn^nriH,- 



3NS0OC Z • 96MV-i 



WO 98/54311 



PCT/US98/10801 



- 20- 

stretches of amino acid sequence must be known. The DNA sequence encoding the 
protein can be deduced from the genetic code, however, the degeneracy of the code must 
be taken into account. It is possible to perform a mixed addition reaction when the 
sequence is degenerate. This includes a heterogeneous mixture of denatured double- 
5 stranded DNA. For such screening, hybridization is preferably performed on either 
single-stranded DNA or denatured double-stranded DNA. 

Since the DNA sequences of the invenuon encode essentially all or part of an zinc finger- 
nucleotide binding protein, it is now a routine matter to prepare, subclone, and express 
the truncated polypeptide fragments of DNA from this or corresponding DNA sequences. 
10 Alternatively, by utilizing the DNA fragments disclosed herein which define the zinc 
finger-nucleotide binding polypeptides of the invention it is possible, in conjunction with 
known techniques, to determine the DNA sequences encoding the entire zinc finger- 
nucleotide binding protein. Such techniques are described in U.S. 4,394,443 and U.S. 
4,446.235 which are incorporated herein by reference. 

15 A cDNA expression library, such as lambda gtl 1, can be screened indirectly for zinc 
finger-nucleotide binding protein or for the zinc finger derived polypeptide having at 
least one epitope, using antibodies specific for the zinc finger-nucleotide binding protein. 
Such antibodies can be either polyclonally or monoclonally derived and used to detect 
expression product indicative of the presence of zinc finger-nucleotide binding protein 

20 cDNA. Alternatively, binding of the derived polypeptides to DNA targets can be assayed 
by incorporated radiolabeled DNA into the target site and testing for retardation of 
electrophoretic mobility as compared with unbound target site. 

A preferred vector used for identification of truncated and/or mutagenized zinc finger- 
nucleotide binding polypeptides is a recombinant DNA (rDNA) molecule containing a 
25 nucleotide sequence that codes for and is capable of expressing a fusion polypeptide 
containing, in the direction of amino- to carboxy-terminus, (1) a prokaryotic secretion 
signal domain, (2) a heterologous polypeptide, and (3) a filamentous phage membrane 
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anchor domain. The vector includes DNA expression control sequences for expressing 
the fusion polypeptide, preferably prokaryotic control sequences. 

The filamentous phage membrane anchor is preferably a domain of the cpLEI or cpVm 
coat protein capable of associating with the matrix of a filamentous phage particle. 
5 thereby incorporating the fusion polypeptide onto the phage surface. 

Hie secretion signal is a leader peptide domain of a protein that targets the protein to the 
pcnplasrnic membrane of giain negative bacteria. A preferred secretion signal is a pelB 
secretion signal. The predicted amino acid residue sequences of the secretion signal 
domain from two pelB gene product variants from Erwinia carotova are described in Lei, 
10 et al. (Nature, 331:543-546, 1988). 

The leader sequence of the pelB protein has previously been used as a secretion signal 
for fusion proteins (Better, et a/.. Science. 240:1041-1043, 1988: Sastry, et a/., Proc. 
Natl. Acad Set. USA, 86:5728-5732, 1989; and Mullinax, et al., Proc. Natl. Acad. ScL 
USA, 87:8095-8099, 1990). Amino acid residue sequences for other secretion signal 
15 polypeptide domains from E. coli useful rn this invention can be found in Oliver, In 
Neidhard, F.C. (ed.), Escherichia coli and Salmonella Typhimurium, American Society 
for Microbiology, Washington, D.C., 1:56-69 (1987). 

Preferred membrane anchors for the vector are obtainable from filamentous phage Ml 3, 
fL fd, and equivalent filamentous phage. Preferred membrane anchor domains are found 

20 in the coat proteins encoded by gene ID and gene V1TI. The membrane anchor domain 
of a filamentous phage coat protein is a portion of the carboxy terminal region of the coal 
protein and includes a region of hydrophobic amino acid residues for spanning a lipid 
bilayer membrane, and a region of charged amino acid residues normally found at the 
cytoplasmic face of the membrane and extending away from the membrane. Ln the phage 

25 fl , gene vm coat protein's membrane spanning region comprises residue Trp-26 through 
Lys-40, and the cytoplasmic region comprises the carbovvterminal 1 1 residue* fr^m 4i 
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to 52 (Ohkawa, et a/., J, Biol. Oiem., 25£:995 1-9958, 1981). Thus, the amino acid 
residue sequence of a preferred membrane anchor domain is derived from the Ml 3 
filamentous phage gene VIE coat protein (aiso designated cpVffi or CP 8). Gene VIE 
coat protein is present on a mature filamentous phage over the majority of the phage 
5 particle with typically about 2500 to 3000 copies of the coat protein. 

In addition, the amino acid residue sequence of another preferred membrane anchor 
domain is derived from the Ml 3 filamentous phage gene III coat protein (also designated 
cpIII). Gene III coat protein is present on a mature filamentous phage at one end of the 
phage particle with typically about 4 to 6 copies of the coat protein. For detailed 
10 descriptions of the structure of filamentous phage particles, their coat proteins and 
particle assembly, see the reviews by Rached, et al. (Microbiol Rev., 50:401-427 1986; 
and Model, et al.< in "The Bactenophages: Vol. 2", R. Calendar, ed. Plenum Publishing 
Co., pp. 375-456, 1988). 

DNA expression control sequences comprise a set of DNA expression signals for 
15 expressing a structural gene product and include both 5' and 3' elements, as is well 
known, operatively linked to the cistron such that the cistron is able to express a 
structural gene product The 5' control sequences define a promoter for initiating 
transcription and a ribosome binding site operatively linked at the 5* terminus of the 
upstream translatable DNA sequence. 

20 To achieve high levels of gene expression in E. coli, it is necessary to use not only strong 
promoters to generate large quantities of mRNA, but also ribosome binding sites to 
ensure that the mRNA is efficiently translated. In E. coli, the ribosome binding site 
includes an initiation codon (AUG) and a sequence 3-9 nucleotides long located 3-1 1 
nucleotides upstream from the mitiauon codon (Shine, et ai y Nature, 254:34, 1975). The 

25 sequence, AGGAGGU, which is called the Shine-Dalgamo (SD) sequence, is 
complementary to the 3' end of E. coli 16S rRNA. Binding of the ribosome to mRNA 
and the sequence at the 3' end of the mRNA can be affected by several factors: 
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fi) The degree of complementarity between the SD sequence and 3' end of the 
16S rRNA. 

(ii) The spacing and possibly the RNA sequence lying between the SD 
sequence and the AUG (Roberts, et at., Proc Natl. Acad. Sci. USA, 76:760, 

5 1979a; Roberts, et al. % Proc. Natl. Acad. Sci. USA, 76:5596, 1979b: 

Guarente, et aL Science, 2Q9:1428, 1980; and Guarente, et ai, Cell 20:543, 
1980). Optimization is achieved by measuring the level of expression of 
genes in plasmids in which this spacing is systematically altered. 
Comparison of different mRNAs shows that there are statistically preferred 
10 sequences from positions -20 to +13 (where the A of the AUG is position 

0) (Gold, et a!., Annu. Rev. Microbiol., ,35:365, 1981). Leader sequences 
have been shown to influence translation dramatically (Roberts, et ai, 1 979 
a, b supra). 

(iii) The nucleotide sequence following the AUG. which affects ribosome 
15 binding (Taniguchi. et at., J. Mol flio/., 118:533, 1978). 

The 3' control sequences define at least one termination (stop) codon in frame with and 
operatively linked to the heterologous fusion polypeptide. 

In preferred embodiments, the vector utilized includes a prokaryotic origin of replication 
or replicon, i.e., a DNA sequence having the ability to direct autonomous replication and 

20 maintenance of the recombinant DNA molecule extra-chromosomal ly in a prokaryotic 
host cell, such as a bactenal host cell, transformed therewith Such origins of replication 
are well known in the art. Preferred origins of replication are those that are efficient in 
the host organism. A preferred host cell is E. coll. For use of a vector in E. coli, a 
preferred origin of replication is ColEl found in pBR322 and a variety of other common 

25 plasmids. Also preferred is the p!5A origin of replication found on pACYC and its 
derivatives. The ColEl and pi 5A replicon have been extensivelv utilized in molecular 
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biology, are available on a variety of plasmids and are described at least by Sambrook, 
et a/.. Molecular Cloning: a Laboratory Manual, 2nd edition. Cold Spring Harbor 
Laboratory Press, 1989). 

The ColEl and pi 5 A replicons are particularly preferred for use in the present invention 
5 because they each have the ability to direct the replication of plasmid in E. coli while the 
other replicon is present in a second plasmid in the same E coli cell. In other words, 
ColEl and pi 5 A are non-interfenng replicons that allow the maintenance of two 
plasmids in the same host (see, for example, Sambrook, et al. y supra, at pages 1 .3-1 .4). 

In addition, those embodiments that include a prokaryotic replicon also include a gene 
10 whose expression confers a selective advantage, such as drug resistance, to a bacterial 
host transformed therewith. Typical bacterial drug resistance genes are those that confer 
resistance to ampicillin, tetracycline, neomycin/kanamycin or cholamphenicol. Vectors 
typically also contain convenient restriction sites for insertion of translatable DNA 
sequences. Exemplary vectors are the plasmids pUC8, pUC9, pBR322, and pBR329 
15 available from BioRad Laboratories, (Richmond, CA) and pPL and pFCK223 available 
from Pharmacia, (Piscataway, NJ) and pBS (Stratagene, La Jolla, CA). 

The vector comprises a first cassette that includes upstream and downstream translatable 
DNA sequences operatively linked via a sequence of nucleotides adapted for directional 
ligation to an insert DNA. The upstream translatable sequence encodes the secretion 

20 signal as defined herein. The downstream translatable sequence encodes the filamentous 
phage membrane anchor as defined herein. The cassette preferably includes DNA 
expression control sequences for expressing the zinc finger-derived polypeptide that is 
produced when an insert translatable DNA sequence (insert DNA) is directionally 
inserted into the cassette via the sequence of nucleotides adapted for directional ligation. 

25 The filamentous phage membrane anchor is preferably a domain of the cplll or cpVTII 
coat protein capable of binding the matrix of a filamentous phage particle, thereby 
incorporating the fusion polypeptide onto the phage surface. 
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The zinc finger derived polypeptide expression vector also contains a second cassette for 
expressing a second receptor polypeptide. "The second cassette includes a second 
translatable DNA sequence that encodes a secretion signal, as defined herein, operatively 
linked at its 3' terminus via a sequence of nucleotides adapted for directional ligation to 
5 a downstream DNA sequence of the vector that typically defines at least one stop codon 
in the reading frame of the cassette. The second translatable DNA sequence is 
operatively linked at its 5 1 terminus to DNA expression control sequences forming the 
5' elements. The second cassette is capable, upon insertion of a translatable DNA 
sequence (insert DNA), of expressing the second fusion polypeptide comprising a 
10 receptor of the secretion signal with a polypeptide coded by the insert DNA. For 
purposes of this invention, the second cassette sequences have been deleted. 

As used herein, the term "vector" refers to a nucleic acid molecule capable of 
transporting between different genetic environments another nucleic acid to which it has 
been operatively linked. Preferred vectors are those capable of autonomous replication 
1 5 and expression of structural gene products present in the DNA segments to which they 
are operatively linked. Vectors, therefore, preferably contain the replicons and selectable 
markers described earlier. 

As used herein with regard to DNA sequences or segments, the phrase "operatively 
linked" means the sequences or segments have been covalently joined, preferably by 

20 conventional phosphodiester bonds, into one strand of DNA, whether in single or double 
stranded form. The choice of vector to which transcription unit or a cassette of this 
invention is operatively linked depends directly, as is well known in the art, on the 
functional properties desired, e.£\, vector replication and protein expression, and the host 
cell to be transformed, these being limitations inherent in the art of constructing 

25 recombinant DNA molecules. 

A sequence of nucleotides adapted for directional ligation, i.e., a polylinker, is a region 
of the DNA expression vector that (\) operativelv links for replication and transport the 
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upstream and downstream translatable DNA sequences and (2) provides a site or means 
for directional ligation of a DNA sequence into the vector. Typically, a directional 
polylinker is a sequence of nucleotides that defines two or more restriction endonuclease 
recognition sequences, or restriction sites. Upon restriction cleavage, the two sites yield 
5 cohesive termini to which a translatable DNA sequence can be ligated to the DNA 
expression vector. Preferably, the two restriction sites provide, upon restriction cleavage, 
cohesive termini that are non-complementary and thereby permit directional insertion of 
a translatable DNA sequence into the cassette. In one embodiment, the directional 
ligation means is provided by nucleotides present in the upstream translatable DNA 
1 0 sequence, downstream translatable DNA sequence, or both. In another embodiment, the 
sequence of nucleotides adapted for directional ligation comprises a sequence of 
nucleotides that defines multiple directional cloning means. Where the sequence of 
nucleotides adapted for directional ligation defines numerous restriction sues, it is 
referred to as a multiple cloning site. 

15 In a preferred embodiment, a DNA expression vector is designed for convenient 
manipulation in the form of a filamentous phage particle encapsulating DNA encoding 
the zinc fmger-nucleotide binding polypeptides of the present invention. In this 
embodiment, a DNA expression vector further contains a nucleotide sequence that 
defines a filamentous phage origin of replication such that the vector, upon presentation 

20 of the appropriate genetic complementation, can replicate as a filamentous phage in 
single stranded replicative form and be packaged into filamentous phage particles. This 
feature provides the ability of the DNA expression vector to be packaged into phage 
particles for subsequent segregation of the particle, and vector contained therein, away 
from other particles that comprise a population of phage particles using screening 

25 technique well known in the art. 

A filamentous phage origin of replication is a region of the phage genome, as is well 
known, that defines sites for initiation of replication, termination of replication and 
packaging of the replicative form produced by replication (see, for example, Rasched, 
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et a/., Microbiol. Rev., 50:401-427, 1986; and Horiuchi. J Kfol Biol., 188:215-223. 
1986). 

A preferred filamentous phage origin of replication for use in the present invention is an 
M13, fl or fd phage origin of replication (Short, et ai (Nucl Acids Res., 16:7583-7600, 
5 1988). Preferred DNA expression vectors are the expression vectors modified pCOMB3 
and specifically pCOMB3 .5. 

The production uf a DNA sequence encoding a zinc finger-nucieotide bindmg 
polypeptide can be accomplished by oligonucleotide(s) which are primers for 
amplification of the genomic polynucleotide encoding an zinc finger-nucieotide binding 

10 polypeptide. These unique oligonucleotide primers can be produced based upon, 
identification of the flanking regions contiguous with the polynucleotide encoding the 
zinc finger-nucieotide binding polypeptide. These oligonucleotide primers comprise 
sequences which are capable of hybridizing with the flanking nucleotide sequence 
encoding a zinc finger-nucieotide binding polypeptide and sequences complementary 

1 5 thereto and can be used to introduce point mutations into the amplification products. 

The primers of the invention include oligonucleotides of sufficient length and appropriate 
sequence so as to provide specific initiation of polymerization on a significant number 
of nucleic acids in the polynucleotide encoding the zinc fmger-nucleotide binding 
polypeptide. Specifically, the term "primer" as used herein refers to a sequence compris- 

20 ing two or more deoxyribonucleotides or ribonucleotides, preferably more than three, 
which sequence is capable of initiating synthesis of a primer extension product, which 
is substantially complementary to a zinc finger-nucieotide binding protein strand, but can 
also introduce mutations into the amplification products at selected residue sites. 
Experimental conditions conducive to synthesis include the presence of nucleoside 

25 triphosphates and an agent for polymerization and extension, such as DNA polymerase, 
and a suitable buffer, temperature and pH. The pnmer is preferably single stranded for 
maximum efficiency in amplification, but mav be double stranded If double siranded 
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the pnmer is first treated to separate the two strands before being used to prepare 
extension products. Preferably, the primer is an oiigodeoxyribonucleotide. The pnmer 
must be sufficiently long to prime the synthesis of extension products in the presence of 
the inducing agent for polymerization and extension of the nucleotides. The exact length 
5 of pnmer will depend on many factors, including temperature, buffer, and nucleotide 
composition. The oligonucleotide primer typically contains 15-22 or more nucleotides, 
although it may contain fewer nucleotides. Alternatively, as is well known in the art, the 
mixture of nucleoside triphosphates can be biased to influence the formation of mutations 
to obtain a library of cDNAs encoding putative zinc finger-nucleotide binding 
10 polypeptides that can be screened in a functional assay for binding to a zinc finger- 
nucleotide binding motif, such as one in a promoter in which the binding inhibits 
transcnptional activation. 

Pnmers of the invention are designed to be " substantial ly M complementary to a segment 
of each strand of polynucleotide encoding the zinc finger-nucleotide binding protein to 

15 be amplified. This means that the pnmers must be sufficiently complementary to 
hybridize with their respective strands under conditions which allow the agent for 
polymenzation and nucleotide extension to act. In other words, the primers should have 
sufficient complementarity with the flanking sequences to hybndize therewith and permit 
amplification of the polynucleotide encoding the zinc finger- nucleotide binding protein. 

20 Preferably, the primers have exact complementanty with the flanking sequence strand. 

Oligonucleotide primers of the invention are employed in the amplification process 
which is an enzymatic chain reaction that produces exponential quantities of 
polynucleotide encoding the zinc finger-nucleotide binding polypeptide relative to the 
number of reaction steps involved. Typically, one pnmer is complementary to the 
25 negative (-) strand of the polynucleotide encoding the zinc finger-nucleotide binding 
protein and the other is complementary to the positive (+) strand. Annealing the primers 
to denatured nucleic acid followed by extension with an enzyme, such as the large 
fragment of DNA Polymerase I (Klenow) and nucleotides, results in newly synthesized 
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(+) and (-) strands containing the zinc finger-nucleotide binding protein sequence. 
Because these newly synthesized sequences are also templates, repeated cycles of 
denaturing, primer annealing, and extension results in exponential production of the 
sequence (i.e., the zinc finger-nucleotide binding protein polynucleotide sequence) 

5 defined by the pnmer. The product of the chain reaction is a discrete nucleic acid duplex 
with termini corresponding to the ends of the specific primers employed. Those of skill 
in the art will know of other amplification methodologies which can also be utilized to 
increase the copy number of target nucleic acid. These may include for example, ligation 
— . u ~*ww ipuwti ^u;, u£u^ wnaiii i^acuuu v^^ix /, aiiu 5uauu uispiacemeni 

0 activation (SDA), although PCR is the preferred method. 

The oligonucleotide primers of the invention may be prepared using any suitable method, 
such as conventional phosphotriester and phosphodiester methods or automated 
embodiments thereof. In one such automated embodiment, diethylphosphoramidites are 
used as starting materials and may be synthesized as described by Beaucage, et ai 
5 [Tetrahedron Letters, 22:1859-1862, 1981). One method for synthesizing 
oligonucleotides on a modified solid support is described in U.S. Patent No. 4,458,066. 
One method of amplification which can be used according to this invention is the 
polymerase chain reaction (PCR) described in U.S. Patent Nos. 4,683.202 and 4.683,195 

Methods for utilizing filamentous phage libraries to obtain mutations of peptide 
0 sequences are disclosed in U. S. Patent 5,223,409 to Ladner et aL which is incorporated 
by reference herein in its entirety. 

In one embodiment of the invention, randomized nucleotide substitutions can be 
performed on the'DNA encoding one or more fingers of a known zinc finger protein to 
obtain a derived polypeptide that modifies gene expression upon binding to a site on the 
5 DNA containing the gene, such as a transcriptional control element. In addition to 
modifications in the amino acids making up the zinc finger, the zinc finger derived 
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polypeptide can contain more or less than the full amount of fingers contained in the wild 
type protein from which it is derived. 

While any method of site directed mutagenesis can be used to perform the mutagenesis, 
preferably the method used to randomize the segment of the zinc finger protein to be 
5 modified utilizes a pool of degenerate oligonucleotide primers containing a plurality of 
triplet codons having the formula NNS or NNK (and its complement NNM), wherein S 
is either G or C, K is either G or T, M is either C or A (the complement of NNK) and N 
can be A, C, G or T. In addition to the degenerate triplet codons, the degenerate 
oligonucleotide primers also contain at least one segment designed to hybridize to the 
1 0 DN A encoding the wild type zinc finger protein on at least one end, and are utilized in 
successive rounds of PGR amplification known in the art as overlap extension PCR so 
as to create a specified region of degeneracy bracketed by the non-degenerate regions of 
the primers in the primer pool. 

The methods of overlap PCR as used to randomize specific regions of a cDNA are well 
1 5 known in the art and are further illustrated in Example 3 below. The degenerate products 
of the overlap PCR reactions are pooled and gel purified, preferably by size exclusion 
chromatography or gel electrophoresis, prior to ligation into a surface display phage 
expression vector to form a library for subsequent screening against a known or putative 
zinc finger-nucleotide binding motif 

20 The degenerate primers are utilized in successive rounds of PCR amplification known 
in the art as overlap extension PCR so as to create a library of cDNA sequences encoding 
putative zinc finger-derived DNA binding polypeptides. Usually the derived 
polypeptides contain a region of degeneracy corresponding to the region of the finger 
that binds to DNA (usually in the tip of the finger and in the a-helix region) bracketed 

25 by non-degenerate regions corresponding to the conserved regions of the finger 
necessary to maintain the three dimensional structure of the finger. 
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Any nucleic acid specimen, in purified or nonpurified form, can be utilized as the starting 
nucleic acid for the above procedures, provided it contains, or is suspected of containing, 
the specific nucleic acid sequence of an zinc finger-nucleotide binding protein of the 
invention. Thus, the process may employ, for example, DNA or RNA. including 

5 messenger RNA, wherein DNA or RNA may be single stranded or double stranded. In 
the event that RNA is to be used as a template, enzymes, and/or conditions optimal for 
reverse transcribing the template to DNA would be utilized. In addition, a DNA-RMA 
hybrid which contains one strand of each may be utilized. A mixture of nucleic acids 
may also be employed, or the nucleic acids produced in a pre vious amplification reaction 

0 herein, using the same or different primers may be so utilized. The specific nucleic acid 
sequence to be amplified, i.e., zinc fmger-nucleotide binding protein sequence, may be 
a fraction of a larger molecule or can be present initially as a discrete molecule, so that 
the specific sequence constitutes the entire nucleic acid. It is not necessary that the 
sequence to be amplified be present initially in a pure form; it may be a minor fraction 

5 of a complex mixture, such as contained in whole human DNA or the DNA of anv 
organism. For example, the source of DNA includes prokaryotes, eukaryotes, viruses 
and plants. 

Where the target nucleic acid sequence of the sample contains two strands, it is necessary 
to separate the strands of the nucleic acid before it can be used as the template. Strand 

0 separation can be effected either as a separate step or simultaneously with the synthesis 
of the primer extension products. This strand separation can be accomplished using 
various suitable denaturing conditions, including physical, chemical, or enzymatic 
means, the word "denaturing" includes all such means. One physical method of 
separating nucleic acid strands involves heating the nucleic acid until it is denatured. 

5 Typical heat denaturation may involve temperatures ranging from about 80 c to 105°C 
fortunes ranging from about 1 to 10 minutes. Strand separation may also be induced by 
an enzyme from the class of enzymes known as helicases or by the enzyme RecA, which 
has helicase activity, and in the presence of riboATP, is known to denature DNA. The 
reaction conditions suitable for strand separation of nucleic acids with helicases are 
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described by Kuhn Hoffrnann-Berling (CSH-Ouantitative Biology, 43:63, 1978) and 
techniques for using RecA are reviewed in C. Radding (Ann. Rev Genetics, l£:405-437, 

If the nucleic acid containing the sequence to be amplified is single stranded, its 
5 complement is synthesized by adding one or two oligonucleotide primers. If a single 
pnmer is utilized, a primer extension product is synthesized in the presence of primer, 
an agent for polymenzation, and the four nucleoside triphosphates described below. The 
product will be partially complementary to the single-stranded nucleic acid and will 
hybridize with a single-stranded nucleic acid to form a duplex of unequal length strands 
10 that may then be separated into single strands to produce two single separated 
complementary strands. Alternatively, two primers may be added to the single-stranded 
nucleic acid and the reaction carried out as described. 

When complementary strands of nucleic acid or acids are separated, regardless of 
whether the nucleic acid was originally double or single stranded, the separated strands 

15 are ready to be used as a template for the synthesis of additional nucleic acid strands. 
This synthesis is performed under conditions allowing hybridization of pnmers to 
templates to occur. Generally synthesis occurs in a buffered aqueous solution, preferably 
at a pH of 7-9, most preferably about 8. Preferably, a molar excess (for genomic nucleic 
acid, usually about 10 8 :1 primentemplate) of the two oligonucleotide pnmers is added 

20 to the buffer containing the separated template strands. It is understood, however, that 
the amount of complementary strand may not be known if the process of the invention 
is used for diagnostic applications, so that the amount of primer relative to the amount 
of complementary strand cannot be determined with certainty. As a practical matter, 
however, the amount of pnmer added will generally be in molar excess over the amount 

25 of complementary strand (template) when the sequence to be amplified is contained in 
a mixture of complicated long-chain nucleic acid strands. A large molar excess is 
preferred to improve the efficiency of the process. 
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The deoxyribonucleotide tnphosphates dATP, dCTP, dGTP, and dTTP are added to the 
synthesis mixture, either separately or together with the primers, in adequate amounts 
and the resulting solution is heated to about 90°-100°C from about 1 to 10 minutes, 
preferably from 1 to 4 minutes. After this heating period, the solution is allowed to cool 
5 to a temperature that is preferable for the primer hybridization. To the cooled mixture 
is added an appropriate agent for effecting the primer extension reaction (called herein 
"agent for polymerization"), and the reaction is allowed to occur under conditions known 
in the art. The agent for polymerization may also be added together with the other 
icagciits if it is heat stable. This synthesis (or amplification) reaction may occui ai iuuiu 
1 0 temperature up to a temperature above which the agent for polymerization no longer 
functions. Most conveniently the reaction occurs at room temperature. 

The agent for polymerization may be any compound or system which will function to 
accomplish the synthesis of primer extension products, including enzymes. Suitable 
enzymes for this purpose include, for example, E. coli DNA polymerase I, Klenow 

15 fragment of E coli DNA polymerase I, T4 DNA polymerase, other available DNA 
polymerases, polymerase muteins, reverse transcriptase, and other enzymes, including 
heat-stable enzymes (i.e., those enzymes which perform primer extension after being 
subjected to temperatures sufficiently elevated to cause denaturation). Suitable enzymes 
will facilitate combination of the nucleotides in the proper manner to form the primer 

20 extension products which are complementary to each zinc finger-nucleotide binding 
protein nucleic acid strand. Generally, the synthesis will be initiated at the 3' end of each 
primer and proceed in the 5' direction along the template strand, until synthesis 
terminates, producing molecules of different lengths. There may be agents for 
polymerization, however, which initiate svnthesis at the 5' end and proceed in the other 

25 direction, using the same process as described above. 

The newly synthesized zinc finger-nucleotide binding polypeptide strand and its 

nucleic acid strand will form a double-stranded molecule under 
hybridizing conditions described above and this hybrid is used in subsequent steps of the 
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process. In the next step, the newly synthesized double-stranded molecule is subjected 
to denaturing conditions using any of the procedures described above to provide single- 
stranded molecules. 

The above process is repeated on the single-stranded molecules. Additional agent for 
5 polymerization, nucleotides, and primers may be added, if necessary, for the reaction to 
proceed under the conditions prescribed above. Again, the synthesis will be initiated at 
one end of each of the oligonucleotide primers and will proceed along the single strands 
of the template to produce additional nucleic acid. After this step, half of the extension 
product will consist of the specific nucleic acid sequence bounded by the two primers. 

10 The steps of denaturing and extension product synthesis can be repeated as often as 
needed to amplify the zinc finger-nucleotide binding protein nucleic acid sequence to the 
extent necessary for detection. The amount of the specific nucleic acid sequence 
produced will accumulate in an exponential fashion. 

Sequences amplified by the methods of the invention can be further evaluated, detected, 
1 5 cloned, sequenced, and the like, either in solution or after binding to a solid support, by 
any method usually applied to the detection of a specific DNA sequence such as PCR, 
oligomer restriction (Saiki, et ai, Bio/Technology y 3:1008-1012, 1985), allele-specific 
oligonucleotide (ASO) probe analysis (Conner, et ai, Proc. Natl. Acad. Sci. USA, 
80:278, 1983), oligonucleotide ligation assays (OLAs) (Landegren, et ai, Science, 
20 241:1077, 1988), and the like. Molecular techniques for DNA analysis have been 
reviewed (Landegren, et ai, Science, 242:229-237, 1988). Preferably, novel zinc finger 
denved-DNA binding polypeptides of the invention can be isolated utilizing the above 
techniques wherein the primers allow modification, such as substitution, of nucleotides 
such that unique zinc fingers are produced (See Examples for further detail). 

25 In the present invention, the zinc finger-nucleotide binding polypeptide encoding 
nucleotide sequences may be inserted into a recombinant expression vector. The term 
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"recombinant expression vector" refers to a piasmid, virus or other vehicle known in the 
art that has been manipulated by insertion or incorporation of zinc finger derived- 
nucleotide binding protein genetic sequences. Such expression vectors contain a 
promotor sequence which facilitates the efficient transcription of the inserted genetic 
5 sequence in the host. The expression vector typically contains an origin of replication, 
a promoter, as well as specific genes which allow phenotypic selection of the 
transformed cells. Vectors suitable for use in the present invention include, but are not 
limited to the T7-based expression vector for expression in bacteria (Rosenberg, et al, 
Gene 56:125. 1987), the pMSXND expression vector fur expression in mammaiian cells 
10 (Lee and Nathans, J Biol Chem. 263:3521, 1988) and baculovirus-derived vectors for 
expression in insect cells. The DNA segment can be present in the vector operably 
linked to regulatory elements, for example, a promoter (e.g., T7, metallothionein I, or 
polyhedrin promoters). 

DNA sequences encoding novel zinc finger-nucieotide binding polypeptides of the 
15 invention can be expressed in vitro by DNA transfer into a suitable host cell. "Host 
cells" are cells in which a vector can be propagated and its DNA expressed. The term 
also includes any progeny of the subject host cell, it is understood that all progeny may 
not be identical to the parental cell since there may be mutations that occur during 
replication. However, such progeny are included when the term "host cell" is used 
20 Methods of stable transfer, in other words when the foreign DNA is continuously 
maintained in the host, are known in the art. 

Transformation of a host cell with recombinant DNA may be carried out by conventional 
techniques as are well known to those skilled in the art. Where the host is prokarvotic, 
such as £ coh, competent cells which are capable of DNA uptake can be prepared from 
25 cells harvested after exponential growth phase and subsequently treated by the CaCL 
method by procedures well known in the art. Alternatively, MgCl : or RbCl can be used. 
Transformation can also be performed after forming a protoplast of the host cell or bv 
electropo ration. 
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When the host is a eukaryote, such methods of transfection of DNA as calcium phosphate 
co-precipitates, conventional mechanical procedures such as microinjection, 
electropnration, insertion of a plasmid encased m liposomes, or virus vectors may be 
used. 



5 A variety of host-expression vector systems may be utilized to express the zinc finger 
denved-nucleotide binding coding sequence. These include but are not limited to 
microorganisms such as bacteria transformed with recombinant bacteriophage DNA, 
plasmid DNA or cosmid DNA expression vectors containing a zinc finger derived- 
nucleotide binding polypeptide coding sequence; yeast transformed with recombinant 

0 yeast expression vectors containing the zinc finger-nucleotide binding coding sequence; 
plant cell systems infected with recombinant virus expression vectors (e.g., cauliflower 
mosaic virus, CaMV; tobacco mosaic virus, TMV) or transformed with recombinant 
plasmid expression vectors (e.g., Ti plasmid) containing a zinc finger derived-DNA 
binding coding sequence; insect cell systems infected with recombinant virus expression 

5 vectors (e.g., baculovirus) containing a zinc finger-nucleotide binding coding sequence; 
or animal cell systems infected with recombinant virus expression vectors (e.g., 
retroviruses, adenovirus, vaccinia virus) containing a zinc finger derived-nucleotide 
binding coding sequence, or transformed animal cell systems engineered for stable 
expression. In such cases where glycosylation may be important, expression systems that 

0 provide for translational and post-translational modifications may be used; e.g., 
mammalian, insect, yeast or plant expression systems. 

Depending on the host/vector system utilized, any of a number of suitable transcription 
and translation elements, including constitutive and inducible promoters, transcription 
enhancer elements, transcripuon terminators, etc. may be used in the expression vector 
5 (see e.g., Bitter, et al.. Methods in Enzymology, 153:5 16-544, 1987). For example, when 
cloning in bacterial systems, inducible promoters such as pL of bacteriophage X, plac, 
ptrp, ptac (ptrp-lac hybrid promoter) and the like may be used. When cloning in 
mammalian cell systems, promoters derived from the genome of mammalian cells (e.g., 
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metallothionein promoter) or from mammalian viruses (e.g., the retrovirus long terminal 
repeat; the adenovirus late promoter; the vaccinia virus 7.5K promoter) may be used- 
Promoters produced by recombinant DNA or synthetic techniques may also be used to 
provide for transcription of the inserted zinc fmger-nucleotide binding polypeptide 
5 coding sequence. 

In bacterial systems a number of expression vectors may be advantageously selected 
depending upon the use intended for the zinc finger derived nucleotide-binding 
polypeptide expressed. For example, when large quantities are to be produced, vectors 
which direct the expression of high levels of fusion protein products that are readily 

1 0 purified may be desirable. Those which are engineered to contain a cleavage site to aid 
in recovering the protein are preferred. Such vectors include but are not limited to the 
£ coli expression vector pUR278 (Ruther, et a!.. EMBO J., 2: 1 791 , 1983), in which the 
zinc finger-nucleotide binding protein coding sequence may be ligated into the vector in 
frame with the lac Z coding region so that a hybrid zinc finger-lac Z protein is produced; 

15 pFN vectors (Inouye & Inouye, Nucleic Acids Res. 13:3101-3109, 1985; Van Heeke & 
Schuster. 7 Biol Chem. 264:5503-5509, 1989); and the like. 



In yeast a number of vectors containing constitutive or inducible promoters may be used. 
For a review see, Current Protocols in Molecular Biology, Vol. 2, 1988, Ed. Ausubel, 
et a/., Greene Publish. Assoc. & Wiley Interscience, Ch. 13; Grant, et a/.. 1987. 

20 Expression and Secretion Vectors for Yeast, in Methods in Enzymology, Eds. Wu & 
Grossman, 31987, Acad. Press, N.Y., Vol. 153, pp.5 16-544; Glover, 1986. DNA 
Cloning, Vol. II, IRL Press, Wash., D.C., Ch. 3; and Bitter, 1987, Heterologous Gene 
Expression in Yeast, Methods in Enzymology, Eds. Bercer & Kimmel. Acad. Press, 
N.Y.. Vol. 152. pp. 673-684; and The Molecular Biology of the Yeast Saccharomyces, 

25 1982. Eds. Strathem et al., Cold Spring Harbor Press, Vols. I and II. A constitutive yeast 
promoter such as ADH or LEU2 or an inducible promoter such as GAL may be used 
(Cloning in Yeast, Ch. 3. R. Rothstein In: DNA Cloning Vol.1 1, A Practical Approach, 
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Ed. DM Glover, 1986, IRL Press, Wash., D.C.). Alternatively, vectors may be used 
which promote integration of foreign DNA sequences into the yeast chromosome. 

In cases where plant expression vectors are used, the expression of a zinc finger- 
nucieotide binding polypeptide coding sequence may be driven by any of a number of 
5 promoters. For example, viral promoters such as the 35S RNA and 19S RNA promoters 
of CaMV (Bnsson, et a!., Nature, 310:51 1-514, 1984), or the coat protein promoter to 
TMV (Takamatsu, et al, EMBO J., 6:307-311, 1987) may be used; alternatively, plant 
promoters such as the small subunit of RUBISCO (Coruzzi, et al. t EMBO J. 3:1671- 
1680, 1984; Broglie, et ai y Science 224:838-843, 1984); or heat shock promoters, e.g., 

10 soybean hspl7.5-E or hspl7.3-B (Gurley, et a/., Mol Cell. Biol. y 6:559-565, 1986) may 
be used. These constructs can be introduced into plant cells using Ti plasmids, Ri 
plasmids, plant virus vectors, direct DNA transformation, microinjection, electroporation, 
etc. For reviews of such techniques see, for example, Weissbach & Weissbach, Methods 
for Plant Molecular Biology, Academic Press. NY, Section VIII, pp. 421-463, 1988; and 

1 5 Gnerson &. Corey, Plant Molecular Biology, 2d Ed., Blackie, London, Ch. 7-9, 1 988. 

An alternative expression system that can be used to express a protein of the invention 
is an insect system. In one such system, Autographa calif ormca nuclear polyhidrosis 
virus (AcNPV) is used as a vector to express foreign genes. The virus grows in 
Spodoptera frugiperda ceils. The zinc finger-nucleotide binding polypeptide coding 

20 sequence may be cloned into non-essential regions (Spodoptera frugiperda for example 
the polyhedrin gene) of the virus and placed under control of an AcNPV promoter (for 
example the polyhedrin promoter). Successful insertion of the zinc finger-nucleotide 
binding polypeptide coding sequence will result in inactivation of the polyhedrin gene 
and production of non-occluded recombinant virus (i.e., virus lacking the proteinaceous 

25 coat coded for by the polyhedrin gene). These recombinant viruses are then used to 
infect cells in which the inserted gene is expressed. (Eg., see Smith, et ai y J. Biol 
46:584, 1983, Smith, U.S. Patent No. 4,215,051). 
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Eukaryotic systems, and preferably mammalian expression systems, allow for proper 
post-translational modifications of expressed mammalian proteins to occur. Therefore, 
eukaryotic cells, such as mammalian cells that possess the cellular machinery for proper 
processing of the primary transcript, glycosylation, phosphorylation, and, advantageously 
5 secretion of the gene product, are the preferred host cells for the expression of a zinc 
finger denved-nucleotide binding polypeptide. Such host cell lines may include but are 
not limited to CHO, VERO, BHK, HeLa, COS, MDCK, -293, and WI38. 

...«..***., t **»« Ji www o^jtviiio uiai uLiiiz.w itcumuuwiu viruses or viraj elements to direct 
expression may be engineered. For example, when using adenovirus expression vectors, 

1 0 the coding sequence of a zinc finger derived polypeptide may be ligated to an adenovirus 
transcription/translation control complex, e.g., the late promoter and tripartite leader 
sequence. This chimeric gene may then be inserted into the adenovirus genome by in 
vitro or in vivo recombination. Insertion in a non-essential region of the viral genome 
(e.g.. region El or E3) will result in a recombinant virus that is viable and capable of 

1 5 expressing the zinc finger polypeptide in infected hosts (e.g., see Logan & Shenk, Proc. 
Natl Acad Sci. USA 81:3655-3659, 1984). Alternatively, the vaccinia virus 7.5K 
promoter may be used, (e.g., see, Mackett, et a/., Proc. Natl. Acad. Sci. USA, 79:7415- 
7419, 1982; Mackett, eta!., J. Virol 49:857-864, 1984; Panicali, et aL Proc Natl. Acad 
Set USA , 79:4927-4931, 1982). Of particular interest are vectors based on bovine 

20 papilloma virus which have the ability to replicate as extrachromosomal elements 
(Sarver, et aL, Mot. Cell. Biol. 1:486, 198 1). Shortly after entry of this DNA into mouse 
cells, the plasmid replicates to about 100 to 200 copies per cell. Transcription of the 
inserted cDNA does not require integration of the plasmid into the host's chromosome, 
thereby yielding a high level of expression These vectors can be used for stable 

25 expression by including a selectable marker in the plasmid, such as the neo gene. 
Alternatively, the retroviral genome can be modified for use as a vector capable of 
introducing and directing the expression of the zinc finger- nucleotide binding protein 
gene in host cells (Cone & Mulligan, Proc Natl Acad. Sci USA £1:6349-6353, 1984). 
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High level expression may also be achieved using inducible promoters, including, but not 
limited to, the metallothionine HA promoter and heat shock promoters. 

For long-term, high-yield production of recombinant proteins, stable expression is 
preferred. Rather than using expression vectors which contain viral origins of 
5 replication, host cells can be transformed with the a cDNA controlled by appropriate 
expression control elements (eg., promoter, enhancer, sequences, transcription 
terminators, polyadenylation sites, etc.), and a selectable marker. The selectable marker 
in the recombinant plasmid confers resistance to the selection and allows cells to stably 
integrate the plasmid into their chromosomes and grow to form foci which in turn can be 

1 0 cloned and expanded into cell lines. For example, following the introduction of foreign 
DNA, engineered cells may be allowed to grow for 1-2 days in an enriched media, and 
then are switched to a selective media. A number of selection systems may be used, 
including but not limited to the herpes simplex virus thymidine kinase (Wigler, et ai, 
Cell 11:223, 1977), hypoxanthine-guanine phosphoribosyltransferase (Szybalska &. 

15 Szybalski, Proc. Natl. Acad. Sci. USA, 48:2026, 1962), and adenine 
phosphoribosyltransferase (Lowy, et ai, Cell, 22:817, 1980) genes, which can be 
employed in tk", hgprt or aptt cells respectively. Also, antimetabolite resistance- 
conferring genes can be used as the basis of selection; for example, the genes for dhfr, 
which confers resistance to methotrexate (Wigler, et ai, Natl Acad. Sci. USA ,77:3567, 

20 1980; O'Hare, et ai, Proc. Natl. Acad Sci. USA , 78:1527, 1981); gpt, which confers 
resistance to mycophenolic acid (Mulligan & Berg, Proc. Natl. Acad. Sci. USA, 78:2072, 
1981; neo, which confers resistance to the aminoglycoside G-418 (Colberre-Garapin, et 
ai, J. Mol. Biol, 150 :1. 1981); and hygro, which confers resistance to hygromycin 
(Santerre, et ai, Gene, 30:147, 1984). Recently, additional selectable genes have been 

25 described, namely trpB, which allows cells to utilize indole in place of tryptophan; hisD, 
which allows cells to utilize histinol in place of histidine (Hartman & Mulligan, Proc. 
Natl Acad Sci. USA, 85:804, 1 988); and ODC (ornithine decarboxylase) which confers 
resistance to the ornithine decarboxylase inhibitor, 2-(difluoromethyl)-DL-omithine, 
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DFMO (McConlogue L., In: Current Communications in Molecular Biology, Cold 
Spring Harbor Laboratory ed., 1987). 

Isolation and purification of microbially expressed protein, or fragments thereof provided 
by the invention, may be carried out by conventional means including preparative 

5 chromatography and immunological separations involving monoclonal or polyclonal 
antibodies. .Antibodies provided in the present invention are immunoreactive with the 
zinc finger-nucleotide binding protein of the invention. Antibody which consists 
essentially of pooled monoclonal antibodies with different epitopic specificities, as well 
as distinct monoclonal antibody preparations are provided. Monoclonal antibodies are 

0 made from antigen containing fragments of the protein by methods well known in the art 
(Kohler, et ai. Nature, 256:495, 1 975; Current Protocols in Molecular Biology, Ausubel, 
etal., ed., 1989). 



The present invention also provides gene therapy for the treatment of cell proliferative 
disorders which are associated with a cellular nucleotide sequence containing a zinc 
5 finger-nucleotide binding motif. Such therapy would achieve its therapeutic effect by 
introduction of the zinc finger-nucleotide binding polypeptide polynucleotide, into cells 
of animals having the proliferative disorder. Delivery of a polynucleotide encoding a 
zinc finger-nucleotide binding protein can be achieved using a recombinant expression 
vector such as a chimeric virus or a colloidal dispersion system, for example. 

0 The term "cell -proliferative disorder" denotes malignant as well as non-malignant cell 
populations which morphologically often appear to differ from the surrounding tissue. 
The cei [-proliferative disorder may be a transcriptional disorder which results in an 
increase or a decrease in gene expression level. The cause of the disorder may be of 
cellular origin or viral origin. Gene therapy using a zinc finger-nucleotide binding 

5 polypeptide can be used to treat a virus-induced cell proliferative disorder in a human, 
for example, as well as in a plant. Treatment can be prophylactic in order to make a plant 
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cell, for example, resistant to a vims, or therapeutic, in order to ameliorate an established 
infection in a cell, by preventing production of viral products. 



A polynucleotide encoding the zinc finger-nucieotide binding polypeptide is useful in 
treating malignancies of the various organ systems, such as, for example, lung, breast, 
5 lymphoid, gastrointestinal, and geruto-unnary tract as well as adenocarcinomas which 
include malignancies such as most colon cancers, renal-cell carcinoma, prostate cancer, 
non-small cell carcinoma of the lung, cancer of the small intestine, and cancer of the 
esophagus. A polynucleotide encoding the zinc finger-nucieotide binding polypeptide 
is also useful in treating non-malignant cell-proliferative diseases such as psoriasis, 
1 0 pemphigus vulgaris, Behcet's syndrome, and lipid histiocytosis. Essentially, any disorder 
which is etiologically linked to the activation of a zinc finger-nucieotide binding motif 
containing promoter, structural gene, or RNA, would be considered susceptible to 
treatment with a polynucleotide encoding a derivative or variant zinc finger derived- 
nucleotide binding polypeptide. 



15 Various viral vectors that can be utilized for gene therapy as taught herein include 
adenovirus, herpes virus, vaccinia, or, preferably, an RNA virus such as a retrovirus. 
Preferably, the retroviral vector is a derivative of a murine or avian retrovirus. Examples 
of retroviral vectors in which a single foreign gene can be inserted include, but are not 
limited to: Moloney murine leukemia virus (MoMuLV), Harvey murine sarcoma virus 

20 (HaMuSV), murine mammary tumor virus (MuMTV), and Rous Sarcoma Virus (RSV). 
A number of additional retroviral vectors can incorporate multiple genes. All of these 
vectors can transfer or incorporate a gene for a selectable marker so that transduced cells 
can be identified and generated. By inserting a zinc finger denved-DNA binding 
polypeptide sequence of interest into the viral vector, along with another gene that 

25 encodes the ligand for a receptor on a specific target cell, for example, the vector is made 
target specific. Retroviral vectors can be made target specific by inserting, for example, 
a polynucleotide encoding a protein. Preferred targeting is accomplished by using an 
antibody to target the retroviral vector. Those of skill in the art will know of, or can 
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readily ascertain without undue experimentation, specific polynucleotide sequences 
which can be inserted into the retroviral genome to allow target specific delivery of the 
retroviral vector containing the zinc flnger-nucleotide binding protein polynucleotide. 

Since recombinant retroviruses are defective, they require assistance in order to produce 
5 infectious vector particles. This assistance can be provided, for example, by using helper 
cell lines that contain plasmids encoding all of the structural genes of the retrovirus under 
the control of regulatory sequences within the LTR. These plasmids are missing a 
iiuLieuLidt: sequence which enables the packaging mechanism to recognize an RNA 
transcript for encapsitation. Helper cell lines which have deletions of the packaging 

10 signal include but are not limited to ¥2, PA317 and PA 12, for example. These cell lines 
produce empty virions, since no genome is packaged. If a retroviral vector is introduced 
into such cells in which the packaging signal is intact, but the structural genes are 
replaced by other genes of interest, the vector can be packaged and vector virion 
produced. The vector virions produced by this method can then be used to infect a tissue 

1 5 cell line, such as NTH 3T3 cells, to produce large quantities of chimeric retroviral virions. 

.Another targeted delivery system for polynucleotides encoding zinc finger derived-DNA 
binding polypeptides is a colloidal dispersion system. Colloidal dispersion systems 
include macromolecule complexes, nanocapsules, microspheres, beads, and lipid-based 
systems including oil-in- water emulsions, micelles, mixed micelles, and liposomes. The 

20 preferred colloidal system of this invention is a liposome. Liposomes are artificial 
membrane vesicles which are useful as delivery vehicles in vitro and in vivo. It has been 
shown that large unilamellar vesicles (LUV), which range in size from 0.2-4.0 urn can 
encapsulate a substantial percentage of an aqueous buffer containing large 
macromoiecules. RNA, DNA and intact virions can be encapsulated within the aqueous 

25 interior and be delivered to cells in a biologically active form (Traley, et aL, Trends 
Biochem. Set., 6:77, 1981). In addition to mammalian cells, liposomes have been used 
for delivery of polynucleotides in plant, yeast and bacterial cells. In order for a liposome 
to be an efficient gene transfer vehicle, the following charac ten sties should he present 



WO 98/54311 



PCT/US98/1080I 



-44- 

(1 ) encapsulation of the genes of interest at high efficiency while not compromising their 
biological activity; (2) preferential and substantial binding to a target cell in comparison 
to non-targei cells, (3) delivery of the aqueous contents of the vesicle to the target cell 
cytoplasm at high efficiency; and (4) accurate and effective expression of genetic 
5 information (Mannino, et aL, Biotechniques, 6*682, 1988). 

The composition of the liposome is usually a combination of phospholipids, particularly 
high-phase-transition-temperature phospholipids, usually in combination with steroids, 
especially cholesterol. Other phospholipids or other lipids may also be used. The 
physical characteristics of liposomes depend on pH, ionic strength, and the presence of 
10 divalent cations. 

Examples of lipids useful in Liposome production include phosphatidyl compounds, such 
as phosphatidylglycerol, phosphatidylcholine, phosphatidyl serine, phosphatidyletha- 
nolamme, sphingolipids, cerebrosides, and ganghosides. Particularly useful are 
diacylphosphatidylglycerols, where the lipid moiety contains from 14-18 carbon atoms, 
1 5 particularly from 16-18 carbon atoms, and is saturated. Illustrative phospholipids include 
egg phosphatidylcholine, dipalmitoylphosphatidylcholine and 
distearoylphosphatidylcholine. 

The targeting of liposomes has been classified based on anatomical and mechanistic 
factors. Anatomical classification is based on the level of selectivity, for example, organ- 

20 specific, cell-specific, and organe lie-specific. Mechanistic targeting can be distinguished 
based upon whether it is passive or active. Passive targeting utilizes the natural tendency 
of liposomes to distribute to cells of the reticuloendothelial system (RES) in organs 
which contain sinusoidal capillaries. Active targeting, on the other hand, involves 
alteration of the liposome by coupling the liposome to a specific ligand such as a 

25 monoclonal antibody, sugar, glycolipid, or protein, or by changing the composition or 
size of the liposome in order to achieve targeting to organs and cell types other than the 
naturally occurring sites of localization. 
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The surface of the targeted delivery system may be modified in a variety of ways. In the 
case of a liposomal targeted delivery system, lipid groups can be incorporated into the 
lipid bilayer of the liposome in order to maintain the targeting ligand in stable association 
with the liposomal bilayer. Various linking groups can be used for joining the lipid 
5 chams to the targeting ligand. 

In general, the compounds bound to the surface of the targeted delivery system will be 
iigands and receptors which will allow the targeted delivery system to find and "home 
in M on the desired cells. A ligand may be any compound of interest which will bind to 
another compound, such as a receptor. 

10 in general, surface membrane proteins which bind to specific effector molecules are 
referred to as receptors. In the present invention, antibodies are preferred receptors. 
.Antibodies can be used to target liposomes to specific cell-surface Iigands. For example, 
certain antigens expressed specifically on tumor cells, referred to as tumor-associated 
antigens (TAAs), may be exploited for the purpose of targeting antibody-zinc fmger- 

1 5 nucleotide binding protein-containing liposomes directly to the malignant tumor. Since 
the zinc finger-nucleotide binding protein gene product may be indiscriminate with 
respect to cell type in its action, a targeted delivery system offers a significant 
improvement over randomly injecting non-specific liposomes. A number of procedures 
can be used to covalently attach either polyclonal or monoclonal antibodies to a liposome 

20 bilayer. Antibody-targeted liposomes can include monoclonal or polyclonal antibodies 
or fragments thereof such as Fab, or F(ab%, as long as they bind efficiently to an the 
antigenic epitope on the target cells. Liposomes may also be targeted to cells expressing 
receptors for hormones or other serum factors. 

In another embodiment, the invention provides a method for obtaining an isolated zinc 
25 fmger-nucleotide binding polypeptide variant which binds to a cellular nucleotide 
sequence comprising, first, identifying the amino acids in a zinc finger-nucleotide 
binding polypeptide that bind to a first cellular nucleotide sequence and modulate the 
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function of the nucleotide sequence. Second, an expression library encoding the 
polypeptide variant containing randomized substitution of the amino acids identified in 
the first step is created. Third, the library is expressed in a suitable host cell, which will 
be apparent to those of skill in the art, and finally, a clone is isolated that produces a 
5 polypeptide variant that binds to a second cellular nucleotide sequence and modulates the 
function of the second nucleotide sequence. The invention also includes a zinc finger- 
nucleotide binding polypeptide variant produced by the method described above. 

Preferably, a phage surface expression system, as described in the Examples of the 
present disclosure, is utilized as the library. The phage library is treated with a reducing 

] 0 reagent, such as dithiothreitol, which allows proper folding of the expression product on 
the phage surface. The library is made from polynucleotide sequences which encode a 
zinc finger-nucleotide binding polypeptide variant and which have been randomized, 
preferably by PCR using primers containing degenerate triplet codons at sequence 
locations corresponding to the determined amino acids in the first step of the method. 

1 5 The degenerate triplet codons have the formula NNS or NNK, wherein S is either G or 
C K is either G or T, and N is independently selected from the group consisting of A, C, 
G, or T. 

The modulation of the function of the cellular nucleotide sequence includes the 
enhancement or suppression of transcription of a gene operatively linked to the cellular 
20 nucleotide sequence, particularly when the nucleotide sequence is a promoter. The 
modulation also includes suppression of transcription of a nucleotide sequence which is 
within a structural gene or a virus DNA or RNA sequence. Modulation also includes 
inhibition of translation of a messenger RNA. 

In addition, the invention discloses a method of treating a cell proliferative disorder, by 
2> the ex vivo introduction of a recombinant expression vector comprising the 
polynucleotide encoding a zinc finger-nucleotide binding polypeptide into a cell to 
modulate in a cell the function of a nucleotide sequence comprising a zinc finger- 
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nucleotide binding motif. The cell proliferative disorder compnses those disorders as 
described above which are typically associated with transcription of a gene at reduced 
or increased levels. The method of the invention offers a technique for modulating such 
gene expression, whether at the promoter, structural gene, or RNA level. The method 
5 includes the removal of a tissue sample from a subject with the disorder, isolating 
hematopoietic or other cells from the tissue sample, and contacting isolated cells with a 
recombinant expression vector containing the DNA encoding zinc finger-nucleotide 
binding protein and, optionally, a target specific gene. Optionally, the cells can be 
treated with a growth factor, such as intcrlcukin-2 for example, to stimulate ceil growth, 

10 before reintroducing the cells into the subject. When reintroduced, the cells will 
specifically target the cell population from which they were originally isolated. In this 
way, the trans-repressing activity of the zinc finger-nucleotide binding polypeptide may 
be used to inhibit or suppress undesirable cell proliferation in a subject. In certain cases, 
modulation of the nucleotide sequence in a cell refers to suppression or enhancement of 

15 the transcription of a gene operatively linked to a cellular nucleotide sequence. 
Preferably, the subject is a human. 

An alternative use for recombinant retroviral vectors comprises the introduction of 
polynucleotide sequences into the host by means of skin transplants of cells containing 
the virus. Long term expression of foreign genes in implants, using cells of fibroblast 

20 origin, may be achieved if a strong housekeeping gene promoter is used to drive 
transcription. For example, the dihydro folate reductase (DHFR) gene promoter may be 
used. Cells such as fibroblasts, can be infected with virions containing a retroviral 
construct containing the gene of interest, for example a truncated and/or mutagenized 
zinc finger-nucleotide binding protein, together with a gene which allows for specific 

21 targeting, such as tumor-associated anugen (TAA), and a strong promoter. The infected 
cells can be embedded in a collagen matrix which can be grafted into the connective 
tissue of the dermis in the recipient subject. As the retrovirus proliferates and escapes 
the matrix it will specifically infect the target cell population. In this wav the 
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transplantation results in increased amounts of trans-repressing zinc finger-nucleotide 
binding polypeptide being produced in ceils manifesting the cell proliferative disorder. 

The novel zinc finger-nucleotide binding proteins of the invention, which modulate 
transcriptional activation or translation either at the promoter, structural gene, or RNA 
5 level, could be used in plant species as well. Transgenic plants would be produced such 
that the plant is resistant to particular bacterial or viral pathogens, for example. Methods 
for transferring and expressing nucleic acids in plants are well known in the art. (See for 
example, Hiatt, et ai, U.S. Patent No. 5,202,422, incorporated herein by reference.) 

In a further embodiment, the invention provides a method for identifying a modulating 
10 polypeptide derived from a zinc finger-nucleotide binding polypeptide that binds to a 
zinc finger-nucleotide binding motif of interest comprising incubating components, 
comprising a nucleotide sequence encoding the putative modulating protein operably 
linked to a first inducible promoter and a reporter gene operably linked to a second 
inducible promoter and a zinc finger-nucleotide binding motif, wherein the incubating 
1:> is carried out under conditions sufficient to allow the components to interact, and 
measuring the effect of the putative modulating protein on the expression of the reporter 
gene. 

The term "modulating" envisions the inhibition or suppression of expression from a 
promoter containing a zinc finger-nucleotide binding motif when it is over-activated, or 

20 augmentation or enhancement of expression from such a promoter when it is under- 
activated. A first inducible promoter, such as the arabinose promoter, is operably linked 
to the nucleotide sequence encoding the putative modulating polypeptide. A second 
inducible promoter, such as the lactose promoter, is operably linked to a zinc finger 
denved-DNA binding motif followed by a reporter gene, such as P-galactosidase. 

25 Incubation of the components may be in vitro or in vivo. In vivo incubation may include 
prokaryotic or eukaryotic systems, such as E.coli or COS cells, respectively. Conditions 
which allow the assay to proceed include incubation in the presence of a substance, such 
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as arabinose and lactose, which activate the first and second inducible promoters, 
respectively, thereby allowing expression of the nucleotide sequence encoding the 
putative trans-rnodulating protein nucleotide sequence. Whether or not the putative 
modulating protein binds to the zinc finger-nucleotide binding motif which is operably 
5 linked to the second inducible promoter, and affects its activity is measured by the 
expression of the reporter gene. For example, if the reporter gene was P-galactosidase, 
the presence of blue or white plaques would indicate whether the putative modulating 
protein enhances or inhibits, respectively, gene expression from the promoter. Other 
commonly used assays to assess the function from a promoter, including 
10 chloramphenicol acetyl transferase (CAT) assay, will be known to those of skill in the 
art. Both prokaryote and eukaryote systems can be utilized. 

The invention is useful for the identification of a novel zinc finger-nucleotide binding 
polypeptide derivative or variant and the nucleotide sequence encoding the polypeptide. 
The method entails modification of the fingers of a wild type zinc finger protein so that 

1 5 they recognize a nucleotide, either DNA or RNA, sequence other than the sequence 
originally recognized by that protein. For example, it may be desirable to modhy a 
known zinc finger protein to produce a new zinc finger-nucleotide binding polypeptide 
that recognizes, binds to, and inactivates the promoter region (LTR) of human 
immunodeficiency virus (HIV). Following identification of the protein, a truncated form 

20 of the protein is produced that represses transcription normally activated from that site. 
In HTV, the target site for a zinc finger-nucleotide binding motif within the promoter is 
CTG-TTG-TGT. The three fingers of zif268, for example, are mutagenized. as described 
in the examples. The fingers are mutagenized independently on the same protein (one 
by one), or independently or "piecewise" on three different zif268 molecules and 

2> religated after being mutagenized. Although one of these two methods is preferable, an 
alternative method would allow the three fingers to be mutagenized simultaneously. 
After mutagenesis, a phage display library is constructed and screened with the appr- 
opriate oligonucleotides which include the binding site of interest. If the fingers were 
mutagenized independently on the same protein, sequential libraries are constructed and 
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panning performed after each library construction. For example, in zif268, a finger 3 
library is constructed and panned with a finger 3 specific oligo; the positive clones from 
this screen are collected and utilized to make a finger 2 library (using finger 1 library 
DNA as a template); panning is performed with a finger 32 specific oligo; DNA is 
5 collected from positive clones and used as a template for finger 1 library construction; 
finally selection for a protein with 3 new fingers is performed with a finger 321 specific 
oligo. The method results in identification of a new zinc finger derived-DNA binding 
protein that recognizes, binds to, and represses transcription from the HIV promoter. 
Subsequent truncation, mutation, or expansion of various fingers of the new protein 
1 0 would result in a protein which represses transcription from the HTV promoter. 

The invention provides, in EXAMPLES 7-13, an illustration of modification of Zi£268 
as described above. Therefore, in another embodiment, the invention provides a novel 
zinc-finger-nucleotide binding polypeptide variant comprising at least two zinc finger 
modules that bind to an HIV sequence and modulates the function of the HTV sequence, 
1 5 for example, the HIV promoter sequence. 

The identificauon of novel zinc finger-nucleotide binding proteins allows modulation of 
gene expression from promoters to which these proteins bind. For example, when a cell 
proliferative disorder is associated with overactivation of a promoter which contains a 
zinc finger-nucleotide binding motif, such suppressive reagents as antisense 
20 polynucleotide sequence or binding antibody can be introduced to a cell, as an alternative 
to the addition of a zinc finger-nucleotide binding protein derivative. Alternatively, 
when a cell proliferative disorder is associated with underactivation of the promoter, a 
sense polynucleotide sequence (the DNA coding strand) or zinc finger-nucleotide 
binding polypeptide can be introduced into the cell. 

25 Minor modifications of the primary amino acid sequence may result in proteins which 
have substantially equivalent activity compared to the zinc finger denved-binding protein 
described herein. Such modifications may be deliberate, as by site-directed mutagenesis, 
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or may be spontaneous. .Ail proteins produced by these modifications are included herein 
as long as zinc finger-nucleotide binding protein activity exists. 

In another embodiment, zinc finger proteins of the invention can be manipulated to 
recognize and bind to extended target sequences. For example, zinc finger proteins 
5 containing from about 2 to 20 zinc fingers Zif(2) to Zif(20), and preferably from about 
2 to 12 zinc fingers, may be fused to the leucine zipper domains of the Jun/Fos proteins, 
prototypical members of the bZIP family of proteins (O'Shea, et al, Science. 254:539. 
1991). Alternatively, zinc finger proteins can be fused to other proteins which are 
capable of forming heterodimers and contain dimerization domains. Such proteins will 
1 0 be known to those of skill in the art. 

The Jun/Fos leucine zippers are described for illustrative purposes and preferentially 
form heterodimers and allow for the recognition of 12 to 72 base pairs. Henceforth, 
Jun/Fos refer to the leucine zipper domains of these proteins. Zinc finger proteins are 
fused to Jun, and independently to Fos by methods commonly used in the art to link 

1 5 proteins. Following purification, the Zif-Jun and Zif-Fos constructs (SEQ ID NOS: 33, 
34 and 35, 36 respectively), the proteins are mixed to spontaneously form a Zif-Jun/Zif- 
Fos heterodimer. Alternatively, coexpression of the genes encoding these proteins 
results in the formation of Zif-Jun/Zif-Fos heterodimers in vivo. Fusion of the 
heterodimer with an N-terminal nuclear localization signal allows for targeting of 

20 expression to the nucleus (Caldcron, et al Cell, 4J.:499, 1982). Activation domains may 
also be incorporated into one or each of the leucine zipper fusion constructs to produce 
activators of transcription fSadowski, et al t Gene, j_J_8:137, 1992). These dimenc 
constructs then allow for specific activation or repression of transcription. These 
heterodimenc Zif constructs are advantageous since they allow for recognition of 

25 palindromic sequences (if the fingers on both Jun and Fos recognize the same DNA/RNA 
sequence) or extended asymmetric sequences (if the fingers on Jun and Fos recognize 
different DNA/RNA sequences). For example the palindromic sequence 
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5' - GGC CCA CGC 




N GCG TGG GCG - 3' 



3' - GCG GGT GCG 1NJ X CGC ACC CGC - 5' (SEQ ID NO: 37) 

is recognized by the Zif268-Fos/Zif268 Jun dimer (x is any number). The spacing 
between subsites is determined by the site of fusion of Zif with the Jun or Fos zipper 
5 domains and the length of the linker between the Zif and zipper domains. Subsite 
spacing is determined by a binding site selection method as is common to those skilled 
in the an (Thiesen, et al, Nucleic Acids Research 18:3203, 1990). Example of the 
recognition of an extended asymmetric sequence is shown by Zif(C7) 6 -Jun/Zif-268-Fos 
dimer. This protein consists of 6 fingers of the C7 type (EXAMPLE 1 1 ) linked to Jun 
1 0 and three lingers of Zif268 linked to Fos, and recognizes the extended sequence: 



3' - GCG GCG GCG GCG GCG GCG INJ X CGC ACC CGC - 5' 
(SEQ ID NO: 38) 

Oxidative or hydrolytic cleavage of DNA or RNA with metal chelate complexes can be 
15 performed by methods known to those skilled in the art. in another embodiment, 
attachment of chelating groups to Zif proteins is preferably facilitated by the 
incorporation of a Cysteine (Cys) residue between the initial Methionine (Met) and the 
first Tyrosine (Tyr) of the protein. The Cys is then alkylated with chelators known to 
those skilled in the art, for example, EDTA derivatives as described (Sigman, 
20 Biochemistry, 29:9097, 1 990). Alternatively the sequence Gly-Gly-His can be made as 
the most amino terminal residues since an amino terminus composed of the residues has 
been described to chelate Cu+2 (Mack, et al, J. Am. Chem. Soc, 110:7572, 1988). 
Preferred metal ions include Cu+2, Ce+3 (Takasaki and Chin, J. Am. Chem Soc, 
H6:l 121, 1994) Zn+2, Cd+2, Pb+2, Fe+2 (Schnaith, et al, Proc. Natl Acad Scl, USA, 
25 91:569, 1994), Fe+3, Ni+2, Ni+3, La+3, Eu+3 (Hall, et al. Chemistry and Biology, 
1:185, 1994), Gd+3, Tb+3, Lu+3 Mn+2, Mg+2. Cleavage with chelated metals is 



5" - CGC CGC CGC CGC CGC CGC 




N GCGTGGGCG-3' 
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generally performed in the presence of oxidizing agents such as 0 : , hydrogen peroxide 
H : 0 : and reducing agents such as thiols and ascorbate. The site and strand (+ or - site) 
of cleavage is determined empirically (Mack, et al, J. Am Chem. Soc, JJ0:7572, 1988) 
and is dependent on the position of the Cys between the Met and the Tyr preceding the 
5 first finger. In the protein Met (AA) Tyr-(Zif) M2 , the chelate becomes Met-(AA) xl Cys- 
Chelate-CAA^-Tyr-CZif),.,^ where AA = any amino acid and x = the number of ammo 
acids. Dimeric zif constructs of the type Zif-Jun/Zif-Fos are preferred for cleavage at 
two sites within the target oligonucleotide or at a single long target site. In the case 
where double stranded cleavage is desired, both Jun and Fos containing proteins are 
1 0 labelled with chelators and cleavage is performed by methods known to those skilled in 
the art. In this case, a staggered double -stranded cut analogous to that produced by 
restriction enzymes is generated. 

Following mutagenesis and selection of variants of the Zif268 protein in which the finger 
1 specificity or affinity is modified, proteins' carrying multiple copies of the finger may 
1 5 be constructed using the TGEKP linker sequence by methods known in the art. For 
example, the C7 finger may be constructed according to the scheme: 

MKLLEPYACPVESCDRRFSKSADLKRHIRHTGEKP- 



(YACPVESCDRRFSKSADLKHIRIHIGEKP) (SEQ ID NO: 39) where the 
sequence of the last linker is subject to change since it is at the terminus and not involved 

20 in linking two fingers together. This protein binds the designed target sequence GCG- 
GCG-GCG (SEQ ID NO: 32) in the oligonucleotide hairpin CCT-CGC-CGC-CGC- 
GGG-TTT-TCC-CGC-GCC-CCC GAG G fSEQ ID NO: 40) with an affinity of 9nM, as 
compared to an affinity of 300 nM for an oligonucleotide encoding the GCG-TGG-GCG 
sequence (as determined by surface plasmon resonance studies). Fingers utilized need 

25 not be identical and may be mixed and matched to produce proteins which recognize a 
desired target sequence. These may also be utilized with leucine zippers (e.g., Fos/Jun) 
or other heterodimers to produce proteins with extended sequence recognition. 
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In addition to producing polymers of finger 1, the enure three finger Zif268 and modified 
versions therein may be fused using the consensus linker TGEKP to produce proteins 
with extended recognition sites. For example, the protein Zif268-Zi£268 can be produced 
in which the natural protein has been fused to itself using the TGEKP linker. This 
protein now binds the sequence GCG-TGG-GCG-GCG-TGG-GCG. Therefore 
modifications within the three fingers of Zif268 or other zinc finger proteins known in 
the art may be fused together to form a protein which recognizes extended sequences. 
These new zinc proteins may also be used in combination with leucine zippers if desired. 

The invention now being fully described, it will be apparent to one of ordinary skill in 
the art that various changes and modifications can be made without departing from the 
spirit or scope of the invention. 
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EXAMPLES 

A recombinant polypeptide containing three of nine of the TF1TIA zinc fingers (Clemens, 
et al, Proc. Nat'l Acad ScL. USA, §2:10822, 1992) has been generated by polymerase 
chain reaction (PCR) amplification from the cDNA for TFII1A and expression in E. coii. 
5 The recombinant protein, termed zfl-3, was purified by ion exchange chromatography 
and its binding site within the 5S gene was determined by a combination of DNase I 
footprinting and binding to synthetic oligonucleotides (Liao, et al, J. Mo I Biol, 
223:857. 1 99?) Thp examples provide experiments which show that the binding of this 
polypeptide to its recognition sequence placed close to an active RNA polymerase 

10 promoter could inhibit the activity of that promoter in vitro. To provide such a test 
system, a 26 bp oligonucleotide containing the 13 bp recognition sequence for zfl-3 was 
cloned into the polylinker region of plasmid pUC19 near the promoter sequence for T7 
RNA polymerase. The DNA binding activity of our preparation of recombinant zfl-3 
was determined by gel mobility shift analysis with the oligonucleotide containing the 

1 5 binding site. In addition, in vitro transcription was performed with T7 RNA polymerase 
in the presence or absence of the same amounts of the zfl -3 polypeptide used in the DNA 
binding titration. For each DNA molecule bound by zfl-3, that DNA molecule is 
rendered inactive in transcription. In these examples, therefore, a zinc finger polypeptide 
has been produced which fully blocked the activity of a promoter by binding to a nearby 

20 target sequence. 
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EXAMPLE 1 
SEQUENCE-SPECIFIC GENE TARGETING 
BY ZINC FINGER PROTEINS 

A. From the crystal structure of zi£268, it is clear that specific histidine (non-zinc 
5 coordinating his residues) and arginine residues on the surface of the a-helix, the finger 
tip, and at helix positions 2, 3, and 6 (immediately preceding the conserved histidine) 
participate in hydrogen bonding to DNA guanines. As the number of structures of zinc 
finger complexes continues to increase, it will be likely that different amino acids and 
different positions may participate in base specific recognition. FIGURE 2 (panel A) 
1 0 shows the sequence of the three amino-terminal fingers of TFII1A with basic amino acids 
at these positions underlined. Similar to finger 2 of the regulatory protein zi£268 (Krox- 
20) and fingers 1 and 3 of Spl, finger 2 of TFIIIA contains histidine and arginine 
residues at these DNA contact positions; further, each of these zinc fingers minimally 
recognizes the sequence GGG (FIGURE 2, panel B) within the 5S gene promoter. 

1 5 A recombinant polypeptide containing these three TFIIIA zinc fingers has been generated 
by polymerase chain reaction (PCR) amplification from the cDNA for TFIIIA and 
expression in E. coli (Clemens, et al, supra). An experiment was designed to determine 
whether the binding of this polypeptide to its recognition sequence, placed close to an 
active RNA polymerase promoter, would inhibit the activity of that promoter in vitro. 

20 The following experiments were done to provide such a test system, A 23 bp 
oligonucleotide (Liao, et al, 1992, supra) containing the 13 bp recognition sequence for 
zfl-3 was cloned into the polylinker region of plasmid pBluescript SK+ (Stratagene, La 
Jolla, CA), near the promoter sequence for T7 RNA polymerase. The parent plasmid was 
digested with the restriction enzyme EcoRV and, after dephosphorylation with calf 

25 intestinal alkaline phosphatase, the phosphorylated 23 bp oligonucleotide was inserted 
by ligation with T4 DNA ligase. The ligation product was used for transformation of 
DH5a E. coli cells. Clones harboring 23 bp inserts were identified by restriction 
digestion of miniprep DNA. The success of cloning was also verified by DNA sequence 
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anaiysis. The DNA binding activity of the preparation of recombinant zfl-3 was also 
determined by gel mobility shift analysis with a 56 bp radiolabeled EcoRI/XhoI 
restriction fragment derived from the cone containing the binding site for zfl-3 and with 
the radiolabeled 23 bp oligonucleotide. Gel shift assays were done as described (Liao, 
5 et al supra; Fried, et aL Nucl Acids., Res. y 9:6505, 1981). The result of the latter 
analysis is shown in FIGURE 3. Binding reactions (20 ^1) also contained 1 ug of 
unlabeled plasmid DNA harboring the same 23 bp sequence. In lanes 2-12, the indicated 
amounts of zfl-3 were also included in the reactions. After incubation at ambient 
temperature for 30 mm, the samples were subjected to electrophoresis on a 6% 

1 0 nondenatunng polyacrylamide gel in 88mM Tris-borate, pH 8.3, buffer. In each reaction, 
a trace amount of the radiolabeled oligonucleotide was used with a constant amount (1 
ug) of plasmid DNA harboring the zfl-3 binding site. The reactions of lanes 2-12 
contained increasing amounts of the zfl-3 polypeptide. The autoradiogram of the gel is 
shown. The results indicate that binding of zfl-3 to the radiolabeled DNA caused a 

1 5 retardation of electrophoretic mobility. The percentage of radiolabeled DNA molecules 
bound by zfl-3 also reflects the percentage of unlabeled plasmid DNA molecules bound. 

In vitro transcription experiments were performed with T7 RNA polymerase in the 
presence or absence of the same amounts of the zfl-3 polypeptide used in the DNA 
binding titration with identical amounts of the plasmid DNA harboring the zfl-3 binding 

20 site. Each reaction contained, in a volume of 25 ul 1 ug of PvuII-digested pBluescnpt 
SK+DNA containing the 23 bp binding site for zfl-3 inserted in the EcoKV site of the 
vector, 40 units of RNasin, 0.6 mM ATP+UTP+CTP. 20 uM GTP and 10 uCi of a- 32 P- 
GTP and 10 units of T7 RNA polymerase (Stratagene). The reaction buffer was provided 
by Stratagene. After incubation at 37°C for 1 hour, the products of transcription were 

25 purified by phenol extraction, concentrated by ethanol precipitation and analyzed on a 
denaturing polyacrylamide gel. T7 transcription was monitored by the incorporation of 
radioactive nucleotides into a run-off transcript. FIGURE 4 shows an autoradiogram of 
a denaturing polyacrylamide gel analysis of the transcription products obtained. In this 
experiment, the plasmid DNA was cleaved with the restriction enzvme PwU and the 
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expected length of the run-off iranscnpt was 245 bases. Addition of zfl-3 polypeptide 
to the reaction repressed transcription by T7 RNA polymerase. 

FIGURE 5 shows a graph in which the percentage of DNA molecules bound by zfl -3 in 
the DNA gel mobility shift assay (x-axis) versus the percentage of inhibition of T7 RNA 
5 polymerase transcription by the same amounts of zfl-3 (y-axis) has been plotted. Note 
that each data point corresponds to identical amounts of zfl-3 used in the two assays. 
The one-to-one correspondence of the two data sets is unequivocal. T7 transcription was 
monitored by the incorporation of radioactive nucleotides into a run-off transcript. 
Transcription was quantitated by gel electrophoresis, autoradiography and densitometry. 
1 0 Gel mobility shift assays were quantitated in a similar fashion. For each DNA molecule 
bound by zfl-3, that DNA molecule is rendered inactive in transcription. In this 
experiment, therefore, a zinc finger polypeptide has fully blocked the activity of a 
promoter by binding to a nearby target sequence. 

B. Since the previous experiment was performed with a prokaryotic RNA polymerase, 
15 the following experiment was performed to determine whether the zinc finger 
polypeptide zfl -3 could also block the activity of a eukaryotic RNA polymerase. To test 
this, a transcription extract prepared from unfertilized Xenopus eggs (Haiti, et a!., J. Cell 
Biol, 120:613, 1993) and the Xenopus 5S RNA gene template was used. These extracts 
are highly active in transcription of 5S RNA and tRNAs by RNA polymerase HI. As a 
20 test template, the 5S RNA gene which naturally contains the binding sites for TFII1A and 
zfl -3, was used. Each reaction contained 10 tA of a high speed supernatant of the egg 
homogenate, 9 ng of TFTJIA, nucleoside triphosphates (ATP, UTP, CTP) at 0.6 mM and 
lO.uCi of a- 32 P-GTP and GTP at 20 uM in a 25 lA reaction. All reactions contained 180 
ng of a plasmid DNA harboring a single copy of the Xenopus somatic-type 5S RNA 
25 gene, and the reactions of lanes 2 and 3 also contained 300 ng of a Xenopus tRN Amet 
gene-containing plasmid. Prior to addition of the Xenopus egg extract and TFHLA, 0.2 
and 0 4 ug of zfl-3 were added to the reactions of lanes 2 and 3, respectively. The 
amount of zfl-3 used in the experiment of lane 2 was sufficient to bind all of the 5S 
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gene -containing DNA in a separate binding reaction. After a 1 5 min. incubation to allow 
binding of zfl -3 to its recognition sequence, the other reaction components were added. 
After a 2 hour incubation, the products of transcription were purified by phenol 
extraction, concentrated by ethanol precipitation and analyzed on a denaturing 
5 polyacrylamide gel. The autoradiogram is shown in FIGURE 6. FIGURE 6 also shows 
the result of a controlled reaction in which no zinc finger protein was added (lane 1 ). As 
a control, lanes 2 and 3 also contained a tRNA gene template, which lacks the binding 
site for TF1IIA and zfl -3. 5S RNA transcription was repressed by zfl -3 while tRNA 



0 of a eukaryotic RNA polymerase III transcription complex and shows that this effect is 
specific for DNA molecules that harbor the binding site for the recombinant zinc finger 
protein derived from TFHLA. 

Three-dimensional solution structures have been determined for a protein containing the 
first three zinc fingers of I f UlA using 2D, 3D, and 4D NMR methods. For this purpose, 

5 the protein was expressed and purified from E. coli and uniformly labeled with l3 C and 
,5 N. The NMR structure shows that the individual zinc fingers fold into the canonical 
finger structure with a small p-sheet packed against an a-helix. The fingers are not 
entirely independent in solution but there is evidence of subtle interactions between 
them. Using similar techniques the 3D structure of a complex between zf 1 -3 and a 1 3 bp 

0 oligonucleotide corresponding to its specific binding site on the 5S RNA gene is 
determined and used to provide essential information on the molecular basis for 
sequence-specific nucleotide recognition by the TFIIIA zinc fingers. This information 
is in turn used in designing new zinc finger derived-nucleotide binding proteins for 
regulating the preselected target genes. Similar NMR methods can be applied to 

5 determine the detailed structures of the complexes formed between designed zmc finger 
proteins and their target genes as part of a structure -based approach to refine target gene 
selectivity and enhance binding affinity. 




These results demonstrate that tS] -3 blocks the assembly 
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EXAMPLE 2 

ISOLATION OF NOVEL ZINC F1NGER-NUCLEOTIDE 

BINDING PROTEINS 

In order to rapidly sort large libraries of zinc finger variants, a phage surface display 
5 system initially developed for antibody libraries (Barbas, et aL, METHODS, 2:119, 1 99 1 ) 
was used. To this end, pComb3 has been modified for zinc finger selection. The 
antibody light chain promoter and cloning sequences have been removed to produce a 
new vector, pComb3.5. The zif268 three finger protein has been modified by PCR and 
inserted into pComb3.5. The zinc fingers are functionally displayed on the phage as 

1 0 determined by solid phase assays which demonstrate that phage bind DN A in a sequence 
dependent fashion. Site-directed mutagenesis has been performed to insert an Nsil site 
between fingers 1 and 2 in order to facilitate library construction. Furthermore, zif268 
is functional when fused to a decapeptide tag which allows its binding to be conveniently 
monitored. An initial library has been constructed using overlap PCR (Barbas. et aL, 

15 Proc. Natl. Acad. Sci., USA, 89:4457, 1992) to create finger 3 variants where 6 residues 
on the amino terminal side of the a helix involved in recognition were varied with an 
NNK doping strategy to provide degeneracy. This third finger originally bound the GCG 
3 bp subsite. Selection for binding to an AAA subsite revealed a consensus pattern 
appearing in the selected sequences. 
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The zif268 containing plasmid, pZif89 (Pavletich, et al. Science 2^2:809, 1991), was 
used as the source of zif268 DNA for modification of the zinc fingers. Briefly, pZifS9 
was cloned into the plasmid, pComb3.5, after amplification by PCR using the following 
primers: 

5 ZF: 5'-ATG AAA CTG CTC GAG CCC TAT GCT TGC CCT GTC GAG-3" 
(SEQUENCE ID NO. 2) 

ZR: 5'-GAG GAG GAG GAG ACT AGT GTC CTT CTG TCT TAA ATG GAT TTT 
GGT-T (SFOTJENCE NO. 3). 

The PCR reaction was performed in a lOOul reaction containing 1 ug of each of 
1 0 oligonucleotide pnmers ZF and ZR, dNTPs (dATP, dCTP, dGTP, dTTP), 1 .5mM MgCla 
Taq polymerase (5 units) 10 ng template pZifi89, and lOul 10 x PCR buffer (Perkin - 
Elmer Corp.). Thirty' rounds of PCR amplification in a Perkin - Elmer Cetus 9600 Gene 
Amp PCR system thermocycler were performed. The amplification cycle consisted of 
denaturing at 94 °C for one minute, annealing at 54 °C for one minute, followed by 
1 5 extension at 72 °C for two minutes. The resultant PCR amplification products were gel 
purified as described below and digested with XhoVSpel and ligated into pComb3.5. 
pComb3.5 is a variant of pComb3 (Barbas, et al, Proc. Natl. Acad. ScL USA. 88:7978, 
1991 ) which has the light chain region, including its lacZ promoter, removed. Briefly, 
pComb3 was digested with Nheh klenow treated, digested with Xbal, and religated to 
20 form pComb3.5. Other similar vectors which could be used in place of pComb3.5, such 
as Surf Zap™ (Stratagene, La Jolla, CA), will be known to those of skill in the art. 

The phagemid pComb3.5 containing zif268 was then used in PCR amplifications as 
described herein to introduce nucleotide substitutions into the zinc fingers of zif268, to 
produce novel zinc fingers which bind to specific recognition sequences and which 
25 enhance or repress transcription after binding to a given promoter sequence. 
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The methods of producing novel zinc fingers with particular sequence recognition 
specificity and regulation of gene expression capabilities involved the following steps: 

1 A first zinc finger (e.g.. Zinc finger 3 of zif268) was first randomized through the use 
of overlap PCR; 

5 2. Amplification products from the overlap PCR containing randomized zinc fingers 
were ligated back into pComb3.5 to form a randomized library; 

3. Following expression of bacteriophage coat protein Ill-anchored zinc finger from the 
library, the surface protein expressing phage were panned against specific zinc finger 
recognition sequences, resulting in the selection of several specific randomized zinc 

10 fingers; and 

4. Following selection of sequence-specific zinc fingers, the corresponding phagemids 
were sequenced and the amino acid residue sequence was derived therefrom. 

EXAMPLE 3 

PREPARATION OF RANDOMIZED ZINC FINGERS 

15 To randomize the zinc fingers of zi£268 in pComb3.5, described above, two separate 
PCR amplifications were performed for each finger as described herein, followed by a 
third overlap PCR amplification that resulted in the annealing of the two previous 
amplification products, followed by a third amplification. The nucleotide sequence of 
zinc finger of zif768 of template pComb3.5 is shown in FIGURE 7 and is listed in 

20 SEQUENCE ID NO. 4. The nucleotide positions that were randomized in zinc finger 3 
began at nucleotide position 217 and ended at position 237, excluding serine. The 
template zif268 sequence at that specified site encoded eight total amino acid residues 
in finger 3. This amino acid residue sequence of finger 3 in pComb3.5 which was to be 
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modified is Arg-Ser- Asp -Glu- Arg -Lvs- Arg -His (SEQUENCE ID NO. 5). The 
underlined amino acids represent those residues which were randomized. 

A pool of oligonucleotides which included degenerate oligonucleotide primers, 
designated BZF3 and ZF36K and non-degenerate pnmers R3B and FTX3 having the 
5 nucleotide formula described below, (synthesized by Operon Technologies, Alameda, 
CA), were used for randomizing the zinc finger 3 of zif268 in pComb3.5. The six triplet 
codons for introducing randomized nucleotides included the repeating sequence NNM 
(complement of NNK), where M can be either G or C and N can be A, C, (J or T. 

The first PCR amplification resulted in the amplification of the 5* region of the zinc 
0 finger 3 fragment in the pComb3.5 phagemid vector clone. To amplify this region, the 
following primer pairs were used. The 5' oligonucleotide primer, FTX3, having the 
nucleotide sequence 5'-GCA ATT AAC CCT CAC TAA AGG G-3' (SEQUENCE ID 
NO. 6), hybridized to the noncoding strand of finger 3 corresponding to the region 5' 
(including the vector sequence) of and including the first two nucleotides of zif268. The 
5 3' oligonucleotide primer, BZF3, having the nucleotide sequence 5'-GGC AAA CTT 
CCT CCC AC A AAT-3* (SEQUENCE ID NO. 7) hybridized to the coding strand of the 
finger 3 beginning at nucleotide 216 and ending at nucleotide 196. 

The PCR reaction was performed in a 100 microliter (ul) reaction containing one 
microgram (ug) of each of oligonucleotide primers FTX3 and BZF3, 200 millimolar 

0 imM) dNTP's (dATP, dCTP, dGTP, dTTP), 1.5 mM MgCl 2 Taq polymerase (5 units) 
(Perkin-Elmer Corp., Norwalk. CT), 10 nanograms (ng) of template pComb3.5 zif268. 
and 10 ul of 1 OX PCR buffer purchased commercially ("Perkin-Elmer Corp.). Thirty 
rounds of PCR amplification in a Perkin-Elmer Cetus 9600 GeneAmp PCR System 
thermocycler were then performed. The amplification cycle consisted of denaturing at 

5 94°C for 30 seconds, annealing at 50°C for 30 seconds, followed by extension at 72°C 
for one minute. To obtain sufficient quantities of amplification product, 30 identical 
PCR reactions were performed 
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The resultant PCR amplification products were then gel purified on a 1 .5% agarose gel 
using standard electroelution techniques as described in M Molecular Cloning: A 

Laboratory Manual" ^amKmnl e>t nl #»Hc P^M Q 

after gel electrophoresis of the digested PCR amplified zinc finger domain, the region of 
5 the gel containing the DNA fragments of predetermined size was excised, electroeluted 
into a dialysis membrane, ethanol precipitated and resuspended in buffer containing 10 
mM Tris-HCl, pH 7.5 and 1 mM EDTA to a final concentration of 50 ng/ml. 

The purified resultant PCR amplification products from the first reaction were then used 
in an overlap extension PCR reaction with the products of the second PCR reaction, both 
1 0 as described below, to recombine the two products into reconstructed zi£268 containing 
randomized zinc fingers. 

The second PCR reaction resulted in the amplification of the 3' end of zif268 finger 3 
overlapping with the above products and extending 3! of finger 3. To amplify this region 
for randomizing the encoded eight amino acid residue sequence of finger 3, the following 

15 primer pairs were used.' The 5' coding oligonucleotide primer pool was designated 
ZF36K and had the nucleotide sequence represented by the formula, 5-ATT TGT GGG 
AGG AAG TTT GCC NNK AGT NNK NNK NNK NNK NNK CAT ACC AAA ATC 
CAT TTA-3' (SEQUENCE ID NO. 8) (nucleotides 196-255). The 3' noncoding primer, 
R3B, hybridized to the coding strand at the 3' end of gene III (gin) having the sequence 

20 5'-TTG ATA TTC ACA AAC GAA TGG-3' (SEQUENCE ID NO. 9). The region 
between the two specified ends of the primer pool is represented by a 15-mer NNK 
degeneracy. The second PCR reaction was performed on a second aliquot of pComb3.5 
template in a 100 ul reaction as described above containing 1 ug of each of 
oligonucleotide primers as described. The resultant PCR products encoded a diverse 

25 population of randomized zif268 finger 3 regions of 8 amino acid residues in length. The 
products were then gel purified as described above. 
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For the annealing reaction of the two PCR amplifications, 1 ug each of gel purified 
products from the first and second PCR reactions were then admixed and fused in the 
absence of pnmers for 35 cycles of PCR as described above. The resultant fusion 
product was then amplified with 1 ug each of FTX3 and R3B oligonucleotide primers as 
5 a pnmer pair in a final PCR reaction to form a complete zif268 fragment by overlap 
extension. The overlap PCR amplification was performed as described for other PCR 
amplifications above. 

To obtain bufficiciu quantities of amph ficaiion produci, 30 ideniicai overlap PCR 
reactions were performed. The resulting fragments extended from 5' to 3' and had 

10 randomized finger 3 encoding 6 amino acid residues. The randomized zif268 
amplification products of approximately 450 base pairs (bp) in length in each of the 30 
reactions were first pooled and then gel purified as described above and cut with Xhol 
and SpeL prior to their relegation into the pComb3.5 surface display phagemid 
expression vector to form a library for subsequent screening against zinc finger 

15 recognition sequence oligos for selection of a specific zinc finger. The ligation 
procedure in creating expression vector libraries and the subsequent expression of the 
zif268 randomized pComb3.5 clones was performed as described below in Example 4. 

Nucleotide substitutions may be performed on additional zinc fingers as well. For 
example, in zif268, fingers 1 and 2 may also be modified so that additional binding sites 

20 may be identified. For modification of zinc finger 2, primers FTX3 (as described above) 
and ZFNsi-B, 5-CAT GCA TAT TCG ACA CTG GAA-3* (SEQUENCE ID NO. 10) 
(nucleotides 100-120) are used for the first PCR reaction, and R3B {described above) and 
ZF2r6F f 5'-CAG TGT CGA ATA TGC ATG CGT AAC TTC (NKK) b ACC ACC CAC 
ATC CGC ACC CACo) (SEQUENCE ID NO. 1 1 ) (nucleotides 103 to 168) are used 

25 for the second reaction. For modification of finger 1 , RTX3 ( above) and ZFI6rb (5'-CTG 
GCC TGT GTG GAT GCG GAT ATG (MNN), CGA MNN AGA AAA GCG GCG 
ATC GCA GGA-3') (SEQUENCE ID NO. 12) (nucleotides 28 to 93) are used for the 
first reaction and ZFIF (5'-CAT ATC CGC ATC CAC ACA GGC CAG-3^ 
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(SEQUENCE ID NO. 13) (nucleotide 70 to 93) and R3B (above) are used in the second 
reaction. The overlap reaction utilizes FTX3 and R3B as described above for finger 3. 

Prpffrahlv P^rh firtCr**T ir ^r^At-P, a, A '*T*A'<t-iAA*%~\\,. ~— A „ __ 

. — ., u uiuuixAww uiuiTiuuau) £ii iu. acQuciiuaii^ uii unc protein 

molecule, as opposed to all three in one reaction. The nucleotide modifications of finger 
5 1 of zif268 would include the underlined amino acids R S DELTR H, (SEQUENCE 
ID NO. 1 4) which is encoded by nucleotides 49 to 72. The nucleotide modifications of 
finger 2 of zif268 would include S R S D H L (SEQUENCE ID NO. 15), which is 
encoded by nucleotides 130 to 147. (See FIGURE 7). 

EXAMPLE 4 

10 PREPARATI ON OF PHAGEMID-DISPLAYED SEQUENCES 

HAVING RANDOMIZED ZINC FINGERS 

The phagemid pComb3.5 containing zif268 sequences is a phagemid expression vector 
that provides for the expression of phage -displayed anchored proteins, as described 
above. The original pComb 3 expression vector was designed to allow for anchoring of 

15 expressed antibody proteins on the bacteriophage coat protein 3 for the cloning of 
combinatorial Fab libranes. Xhol and Spel sites were provided for cloning complete 
PCR-amplified heavy chain (Fd) sequences consisting of the region beginning with 
framework 1 and extending through framework 4. Gene III of filamentous phage 
encodes this 406-residue minor phage coat protein, cpEQ (cp3), which is expressed prior 

20 to extrusion in the phage assembly process on a bacterial membrane and accumulates on 
the inner membrane facing into the periplasm of £ coll 

In this system, the first cistron encodes a periplasmic secretion signal (pelB leader) 
operatively linked to the fusion protein, zif268-cpUI. The presence of the pelB leader 
facilitates the secretion of both the fusion pro tern containing randomized zinc finger from 
25 the bacterial cytoplasm into the periplasmic space. 
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By this process, the zif268-cpIII was delivered to the penplasmic space by the pelB 
leader sequence, which was subsequently cleaved. The randomized zinc finger was 
anchored m the membrane by the cpIH membrane anchor domain. The phagemid vector, 
designated pComb3.5, allowed for surface display of the zinc finger protein. The 

5 presence of the XhoUSpel sites allowed for the insertion of XhoUSpel digests of the 
randomized zif268 PCR products in the pComb3.5 vector. Thus, the ligation of the 
zi£268 mutagenized nucleotide sequence prepared in Example 3 resulted in the in-frame 
ligation of a complete zi£268 fragment consisting of PCR amplified finger 3. The 
cloning sites in the pCcrnb3.5 expression vector were compatible with previously 

0 reported mouse and human PCR primers as described by Huse, et al, Science, 
246:1275-1281 (1989) and Persson, et al, Proc. Natl Acad. Set. USA, 88:2432-2436 
(1991). The nucleotide sequence of the pelB, a leader sequence for directing the 
expressed protein to the penplasmic space, was as reported by Huse, et al., supra 

The vector also contained a ribosome binding site as described by Shine, et al, Nature, 
5 254:34, 1975). The sequence of the phagemid vector, pBluescript, which includes ColEl 
and Fl origins and a beta-lactamase gene, has been previously described by Short, et al, 
Nuc Acids Res., 16:7583-7600, (1988) and has the GenBank Accession Number 52330 
for the complete sequence. Additional restriction sites. Sail, Acch Hindi. ClaL HinaTll, 
£coRV, Pstl and Smah located between the Xhol and Spel sites of the empty vector were 
0 derived from a 5 1 base pair stuffer fragment of pBluescript as described by Short, et al, 
supra. A nucleotide sequence that encodes a flexible 5 amino acid residue tether 
sequence which lacks an ordered secondary structure was juxtaposed between the Fab 
and cp3 nucleotide domains so that interaction in the expressed fusion protein was 
minimized. 

5 Thus, the resultant combinatorial vector, pComb3.5. consisted of a DNA molecule 
having a cassette to express a fusion protein, zif268/cp3. The vector also contained 
nucleotide residue sequences for the following operatively linked elements listed in a 5' 
to 3' direction: the cassette consisting of LacZ promoter/operator sequences: a Natl 
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restriction site; a nbosome binding site; a pelB leader; a spacer region; a cloning region 
bordered by 5' Xhol and 3' Spel restriction sites; the tether sequence; and the sequences 
encoding bacteriophage cp3 followed by a stop codon. A Nhel restriction site located 
between the original two cassettes (for heavy and light chains); a second lacZ 
> promoter/operator sequence followed by an expression control ribosome binding site; a 
pelB leader; a spacer region; a cloning region bordered by 5' Sacl and a 3' Xbal 
restriction sites followed by expression control stop sequences and a second Notl 
restriction site were deleted from pComb3 to form pComb 3.5. Those of skill in the art 
will know of similar vectors that could be utilize in the method of the invention, such as 
10 the Surf Zap™ vector (Stratagene, La Jolla, CA.). 

In the above expression vector, the zif268/cp3 fusion protein is placed under the control 
of a lac promoter/ operator sequence and directed to the periplasmic space by pelB leader 
sequences for functional assembly on the membrane. Inclusion of the phage Fl 
intergenic region in the vector allowed for the packaging of single-stranded phagemid 

15 with the aid of helper phage. The use of helper phage superinfection allowed for the 
expression of two forms of cp3. Consequently, normal phage morphogenesis was 
perturbed by competition between the Fd/cp3 fusion and the native cp3 of the helper 
phage for incorporation into the virion. The resulting packaged phagemid carried native 
cp3, which is necessary for infection, and the encoded ftision protein, which is displayed 

20 for selection. Fusion with the C-terminaJ domain was necessitated by the phagemid 
approach because fusion with the infective N-terminal domain would render the host cell 
resistant to infection. 

The pComb3 and 3.5 expression vector described above forms the basic construct of the 
display phagemid expression vector used in this invention for the production of 
25 randomized zinc finger proteins. 
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EXAMPLE S 
PHAGEMID LIBRARY CONSTRUCTION 

In order to obtain expressed protein representing randomized zinc fingers, phagemid 
libraries were constructed. The libraries provided for surface expression of recombinant 
5 molecules where zinc fingers were randomized as described in Example 3. 

For preparation of phagemid libraries for expressing the PCR products prepared in 
Exampie 3, the PCR products were first digested with Xhol and Spel and separately 
li gated with a similarly digested original (i.e., not randomized) pComb3.5 phagemid 
expression vector. The Xhol and Spel sites were present in the pComb3.5 vector as 

10 described above. The ligation resulted in operatively linking the zif268 to the vector, 
located 5' to the cp3 gene. Since the amplification products were inserted into the 
template pComb3.5 expression vector that originally had the heavy chain variable 
domain sequences, only the heavy chain domain cloning site was replaced leaving the 
rest of the pComb3.5 expression vector unchanged. Upon expression from the 

1 5 recombinant clones, the expressed proteins contained a randomized zinc finger. 

Phagemid libraries for expressing each of the randomized zinc fingers of this invention 
were prepared in the following procedure. To form circularized vectors containing the 
PCR product insert, 640 ng of the digested PCR products were admixed with 2 ug of the 
linearized pComb3.5 phagemid vector and ligation was allowed to proceed overnight at 

20 room temperature using 10 units of BRL ligase (Gaithersburg, MD) in BRL ligase buffer 
in a reaction volume of 150 ul. Five separate ligation reactions were performed to 
increase the size of the phage library having randomized zinc fingers. Following the 
ligation reactions, the circularized DNA was precipitated at -20 °C for 2 hours by the 
admixture of 2 ul of 20 mg/ml glycogen, 15 ul of 3 M sodium acetate at pH 5.2 and 300 

25 ul of ethanol. DNA was then pelleted by microcentrifugation at 4 °C for 1 5 minutes. The 
DNA pellet was washed with cold 70% ethanol and dried under vacuum. The pellet was 
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resuspended in 10 ul of water and transformed by electroporation into 300 ui of E. coli 
XL 1 -Blue cells to form a phage library. 

After transformation, to isolate phage expressing mutagenized finger 3, phage were 
induced as described below for subsequent panning on a hairpin oligo having the 
5 following sequence (SEQUENCE ID NO. 1 6): 

NH ? -CGT-AAA- TGG - GCG -CCC - T 

T 
T 

1 0 GCA-TTT-ACC-CGC-GGG - T 

The bold sequence indicates the new zinc finger 3 binding site (formerly GCG), the 
underlined sequence represents the finger 2 site and the double underlining represents 
the finger 1 binding site. 

15 Transformed E. coli were grown in 3 ml of SOC medium (SOC was prepared by 
admixture of 20 grams (g) bacto-tryptone, 5 g yeast extract and 0.5 g NaCl in 1 liter of 
water, adjusting the pH to 7.5 and admixing 20 ml of glucose just before use to induce 
the expression of the zif268-cpIII), were admixed and the culture was shaken at 220 rpm 
for 1 hour at 37 °C. Following this incubation, 10 ml of SB (SB was prepared by 

20 admixing 30 g tryptone, 20 g yeast extract, and 10 g Mops buffer per liter with pH 
adjusted to 7) containing 20 ug/ml carbenicillin and 1 0 ug/ml tetracycline were admixed 
and the admixture was shaken at 300 rpm for an additional hour. This resultant 
admixture was admixed to 100 ml SB containing 50 ug/ml carbenicillin and 10 ug/ml 
tetracycline and shaken for 1 hour, after which helper phage VCSM13 (10 12 pfu) were 
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admixed and the admixture was shaken for an additional 2 hours at 37°C. After this 
time, 70 ug/ml kanamycin was admixed and maintained at 30 °C overnight. The lower 
temperature resulted in better expression of zif268 on the surface of the phage. The 
supernatant was cleared by centrifugation (4000 rpm for 1 5 minutes in a JA10 rotor at 
5 4°C). Phage were precipitated by admixture of 4% (w/v) polyethylene glycol 8000 and 
3% (w/v) NaCl and maintained on ice for 30 minutes, followed by centrifugation (9000 
rpm for 20 minutes in a JA10 rotor at 4°C). Phage pellets were resuspended in 2 ml of 
buffer (5mM DTT, 1 OmMTris-HCI, pH 7.56, 90mM KC1, 90mM ZnCl 2 , ImM MgCl : and 
microcentrifuged for three minutes to pellet debris, transferred to fresh tabes aud stored 
1 0 at -20°C for subsequent screening as described below. DTT was added for refolding of 
the polypeptide on the phage surface. 



For determining the titering colony forming units (cfu), phage (packaged phagemid) were 
diluted in SB and 1 ui was used to infect 50 ul of fresh (A OD600 = 1) E. coli XLI-BIue 
cells grown in SB containing 10 ug/ml tetracycline. Phage and cells were maintained at 
1 5 room temperature for 1 5 minutes and then directly plated on LB/carbenicillin plates. The 
randomized zinc finger 3 library consisted of 5 x 10 7 PFU total. 

Multiple Pannings of the Phage Library 

The phage library was panned against the hairpin oligo containing an altered binding site, 
as described above, on coated microtiter plates to select for novel zinc fingers. 

20 The panning procedure used, comprised of several rounds of recognition and replication, 
was a modification of that originally described by Parmley and Smith fParmley, et al, 
Gene. 73:305-318. 1988; Barbas. et al. 1991, supra.). Five rounds of panning were 
performed to ennch for sequence-specific binding clones. For thus procedure, four wells 
of a microtiter plate (Costar 3690) were coated by drying overnight at 37 °C with lug the 

25 oligo or the oligo was covalently attached to BSA with EDC/NHS activation to coat the 
plate (360 ug acetylated BSA (Boehringer Manheim), 577 ug oligo, 40mM NHS, and 
lOOmM EDC were combined in 1.8 ml total volume and incubated overnight at room 
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temperature. The plates were coated using 50ul per plate and incubated at 4°C 
overnight. The wells were washed twice with water and blocked by completely filling 
the well with 3% (w/v) BSA in PBS and maintaining the plate at 37°C for one hour. 
After the blocking solution was shaken out, 50 ul of the phage suspension prepared above 
5 (typically 1 0 :: pfu) were admixed to each well, and the plate was maintained for 2 hours 
at37°C. 

Phage were removed and the plate was washed once with water. Each well was then 
washed 10 times with TBS/Tween (50 mM Tns-HCl at pH 7.5, 150 mM NaCl, 0.5% 
Tween 20) over a period of I hour at room temperature where the washing consisted of 

10 pipetting up and down to wash the well, each time allowing the well to remain 
completely filled with TBS/Tween between washings. The plate was washed once more 
with distilled water and adherent phage were eluted by the addition of 50 ul of elution 
buffer (0. 1 M HC1, adjusted to pH 2.2 with solid glycine, containing 1 mg/ml BSA) to 
each well followed by maintenance at room temperature for 10 minutes. The elution 

1 5 buffer was pipetted up and down several times, removed, and neutralized with 3 ul of 2 
M Tris base per 50 ul of elution buffer used. 

Eluted phage were used to infect 2 ml of fresh (OD^ = 1) £. coli XLl-Blue cells for 15 
minutes at room temperature, after which time 10 ml of SB containing 20 ug/ml 
carbenicillin and 10 ug/ml tetracycline was admixed. Aliquots of 20, 10, and 1/10 ul 

20 were removed from the culture for plating to determine the number of phage (packaged 
phagemids) that were eluted from the plate. The culture was shaken for 1 hour at 37 °C, 
after which it was added to 1 00 ml of SB containing 50 ug/ml carbenicillin and 1 0 ug/ml 
tetracycline and shaken for 1 hour. Helper phage VCSM13 (10 12 pfu) were then added 
and the culture was shaken for an additional 2 hours. After this time, 70 ug/ml 

25 kanamycin was added and the culture was incubated at 37 °C overnight. Phage 
preparation and further panning were repeated as described above. 
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Following each round of panning, the percentage yield of phage were determined, where 
% yield = (number of phage eluted/nuraber of phage applied) X 100. The initial phage 
input ratio was determined by utering on selective plates to be approximately 1 0" cfu for 
each round of panning. The final phage output ratio was determined by infecting two ml 
5 of logarithmic phase XL 1 -Blue cells as described above and plating aliquots on selective 
plates. From this procedure, clones were selected from the Fab library for their ability 
to bind to the new binding sequence oligo. The selected clones had randomized zinc 
finger 3 domains. 

The results from sequential panning of the randomized zinc finger 3 library revealed five 
10 binding sequences which recognized the new finger 3 site. The native site, GCG, was 
altered to AAA and the following sequences shown in Table I were identified to bind 
.AAA. 
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TABIhE 1 
BINDING SEQUENCE 



10 



SEQUENCE ID NO. 17 


RSD ERKRH 1 


SEQUENCE ID NO. 18 


WSI PVL LH 


SEQUENCE ID NO. 19 


WSL LPV LH 


SEQUENCE ID NO. 20 


FSF LLP LH 


SEQUENCE ID NO. 21 


LST WRG WH 


SEQUENCE ID NO. 22 


TSI QLP YH 



RSD ERK RH is the native Finger 3 binding sequence. 



EXAMPLE 6 

COTRANSFORMATION ASSAY FOR IDENTIFICATION 
OF ZINC FINGER ACTIVATION OF PROMOTER 

1 5 In order to assess the functional properties of the new zinc fingers generated, an E. coli 
based in vivo system has been devised. This system utilizes two plasmids with the 
compatible replicons colEl and p 1 5. Cytosplamic expression of the zinc finger is 
provided by the arabinase promoter in the colEl plasmid. The pi 5 replica containing 
plasmid contains a zinc finger binding site in place of the repressor binding site in a 

20 plasmid which expresses the a fragment of P galactosidase. The binding of the zinc 
finger to this site on the second plasmid shuts-off the production of P galactosidase and 
thus novel zinc fingers can be assessed in this in vivo assay for function using a 
convenient blue/white selection. For example, in the presence of arabinose and lactose, 
the zinc finger gene is expressed, the protein product binds to the zinc finger binding site 

25 and represses the lactose promoter. Therefore, no P-galactosidase is produced and white 
plaques would be present. This system which is compatible with respect to restriction 
sites with pComb3.5, will facilitate the rapid characterization of novel fingers. 
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Furthermore, this approach could be extended to allow for the genetic selection of novel 
transcriptional regulators. 

Another method of mutageruzing a wild type zinc finger-nucleotide binding protein 
includes segmental shuffling using a PCR technique which allows for the shuffling of 
5 gene segments between collections of genes. Preferably, the genes contain limited 
regions of homology, and at least 15 base pairs of contiguous sequence identity . 
Collections of zinc finger genes in the vector pComb3.5 are used as templates for the 
PCR technique Four cycles of PCR are performed by denaturation, for example, for 1 
min at 94 °C and annealling of 50 °C for 15 seconds. In separate experiments PCR is 

1 0 performed at 94°C, 1 min, 50°C, 30 sec; 94°, 1 min, 50°, I min; 94°, 1 mm, 50°, 15 sec, 
72°, 1 sec. All experiments use the same template (a lOng mixture). The experiment is 
performed such that under each condition two sets of reactions are performed. Each set 
has only a top or a bottom strand primer, which leads to the generation of single -stranded 
DNA's of different lengths. For example, FTX3. ZFIF and FZF3 primers may be used 

15 in a separate set to give single stranded products. The products from these reactions are 
then pooled and additional 5' and 3' terminal primers (e.g., FTX3 and R3B) are added and 
the mix is subjected to 35 additional rounds of PCR at 94°C, 1 min, 50°, 15 sec, 72°, 1 
min 30 sec. The resultant mixture may then be cloned by Xho 1/Spe I digestion. The 
new shuffled zinc fingers can be selected as described above, by panning a display of 

20 zinc fingers on any genetic package for selection of the optimal zinc-finger collections. 
This technique may be applied to any collection of genes which contain at least 1 5 bp of 
contiguous sequence identity. Primers may also be doped to a defined extent as 
described above using the NNK example, to introduce mutations in primer binding 
regions. Reaction times may be varied depending on length of template and number of 

25 primers used. 
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EXAMPLE 7 
MODIFICATION OF SPECIFICITY OF Zif268 

Reagents, Strains, and Vectors 

Restriction endonucleases were obtained from New England Biolabs or Boehnnger 
5 Mannheim. T4 DNA ligase was the product of GIBCO BRL. Tag polymerase and Vent 
polymerase was purchased from Pro mega. Heparin- Sepharose CL-6B medium was from 
Pharmacia. Oligonucleotides were from Operon Technologies (Alameda, CA), or 
prepared on a Gene Assembler Plus (Pharmacia LKB) in the laboratory. pZif89 was a 
gift from Drs. Pavletich and Pabo (Pavletich, Science, 252:809-8 1 7, 1 99 1 ). Escherichia 
10 coli BL21(DE3)pLysS and plasmid pET3a was from Novagen, Escherichia coli XL1- 
Blue, phage VCSM13, the phagemid vector pComb3, and pAraHA are as described 
(Barbas III, et al, Proc. Natl. Acad. Sci. USA, 88:7978-7982, 1991; Barbas in, et a!., 
Methods. A Companion to Methods in Enzymology, 11 9-1 24, 1991). 

Plasmid Construction 

15 Genes encoding wild-type zinc-finger proteins were placed under the control of the 
Salmonella ryphimurium araB promoter by insertion of a DNA fragment amplified by 
the polymerase chain reaction (PCR) and containing the wild-type Zif268 gene of pzif89 
(Pavletich, supra) with the addition of multiple restriction sites (XhoVSacV and 
XbaVSpeY). The resulting plasmid vector was subsequently used for subcloning the 

20 selected zinc-finger genes for immunoscreening. In this vector the zinc finger protein 
is expressed as a fusion with a hemagglutinin decapeptide tag at its C-terminus which 
may be detected with an anti-decapeptide monoclonal antibody (FIGURE 8A) (Field, et 
al, Mol. & Cell. Biol. 8:2159-2165, 1988). The Zif268 protein is aligned to show the 
conserved features of each zinc finger. The a-helices and antiparallel (i-sheets are 

25 indicated. Six amino-acid residues underlined in each finger sequence were randomized 
in library constructions. The C-terminal end of Zif268 protein was fused with a fragment 
containing a decapeptide tag. The position of fusion is indicated by an arrow. 
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The phagemid pComb3 was modified by digestion with Nhel and Xbal to remove the 
antibody light chain fragment, filled with Klenow fragment, and the backbone was self- 
ligated, yielding plasmid pComb3.5. The Zif268 PCR fragment was inserted into 
pComb3.5, as above. To eliminate background problems in library construction a 1 .1-kb 
5 nonfunctional stuffer was substituted for the wild-type Zif268 gene using Sad and Xbal. 
The resulting plasmid was digested by Sacl and Xbal to excise the stuffer and the 
pComb3.5 backbone was gel-purified and served as the vector for library construction. 

^/IHV 1 IllgVI Lj\Ut ai It 3 

Three zinc-finger libraries were constructed by PCR overlap extension using conditions 

10 previously described in Example 3. Briefly, for finger 1 library primer pairs A <5'-GTC 
CAT A AG ATT AGC GGA TCC-3 1 ) (SEQ. ID NO:29) and Zfl6rb (SEQ. ID NO: 12); 
(where N is A. T, G, or C, and M is A or C), and B (5'-GTG AGC GAG GAA GCG 
GAA GAG-3') (SEQ. ID NO:30) and Zflf (SEQ. ID NO: 13) were used to amplify 
fragments of Zi£268 gene using plasmid pAia-Zi/268 as a template. Two PCR fragments 

1 5 were mixed at equal molar ratio and the mixture was used as templates for overlap 
extension. The recombinant fragments were then PCR-amplified using primers A and 
B. and the resulting product was digested with Sacl and Xbal and gel purified. For each 
ligation reaction, 280 ng of digested fragment was ligated with 1.8 ug of pComb3.5 
vector at room temperature overnight. Twelve reactions were performed, and the DN A 

20 was ethanol -precipitated and electroporated into E. coli XL 1 -Blue. The libraries of 
finger 2 and 3 were constructed in a similar manner except that the PCR primers Zfl 6rb 
and ZF1F used in finger 1 library construction were replaced by Zfnsi-B (SEQ. ID 
NO: 10) and ZF2r6F (SEQ. ID NOT 1) (where K is G or T) for finger 2 library, and by 
BZF3 (SEQ. ID NO:7) and ZF36K (SEQ. ID NO:8) for finger 3 library. In the libraries. 

25 six amino-acid residues corresponding to the a-helix positions -1, 2, 3, 4, 5. 6 of finger 
I and 3, positions -2, -1, 1, 2, 3, 4 of finger 2 were randomized (FIGURE 8 A). 
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In Vitro Selection of Zinc Fiogers 

A 34-nucleotide hairpin DNA containing either consensus or altered Zi£268 binding site 
was used for zinc-finger selection (FIGURE 8). The consensus binding site is denoted 
as Z268N (S'-CCT GCG TG£ fiC£ CCC TTTT GGG CGC CCA CGC AGG-3') (SEQ. 
5 ID NO: 31). The altered site for finger 1 is TGT (5-CCT GCG TGG IQT CCC TTTT 
GGG ACA CAA CGC AGG-3 1 ) for finger 2 is TTG (S'-CCT GCG TTG GCG CCC 
TTTT GGG CGC CAA CGC AGG-3') and for finger 3 is CTG (5'-CCT CT£ TGG GCG 
CCC TTTT GGG CGC CCA CAG AGG-3'). The oligonucleotide was synthesized with 
a primary n-hexyl amino group at its 5 f end. A DNA-BSA conjugate was prepared by 

1 0 mixing 30 DNA with 3 mM acetylated BSA in a solution containing 1 00 mM 1 -(3- 
dimethylaminopropyl)-3-ethylcarbodiimide hydrochloride (EDCI) and 40 mM N- 
hydroxysuccinimide (NHS) as room temperature for 5 -hours or overnight. Zif268 phage, 
I0' 2 colony forming units, in 50 ul zinc buffer (lOmM Tns-Cl, pH 7.5, 90 mM KC1, 1 
mM MgCl 2 , 90 uM ZnQ , ImM MgCl and 5 mM DTD containing 1% BSA was 

1 5 applied to a microliter well precoated with 4.9 ug of DNA-BSA conjugate in 25 ^1 PBS 
buffer (10 mM potassium phosphate, 1 60 mM NaCl, pH 7.4) per well After 2 hours of 
incubation at 37 °C, the phage was removed and the plate washed once by TBS buffer (50 
mM Tris-Cl and 150 mM NaCl, pH 7.5) containing 0.5% Tween for the first round of 
selection. The plate was washed 5 times for round 2, and 10 umes for further rounds. 

20 Bound phage was extracted with elution buffer (0.1 M HCl, pH 2.2 (adjusted with 
glycine), and 1% BSA), and used in infect E. coli XL 1-Blue cells to produce phage for 
the subsequent selection. 

Immunoscreening 

Mutant zinc ringer genes selected after five or six rounds of panning were subcloned into 
25 the pAraHA vector using Xhol and Spel restriction sites. Typically, 20 clones were 
screened at a time. Cells were grown at 37°C to late-log phase (OD^O.8-1 ) in the 6 ml 
SB media (Barbas III, el ai, supra) containing 30 ug'ml chloramphenicol. Expression 
of zinc-finger proteins was induced with addition of 1% of arabinose. Cells were 
harvested 3 to 12 hours following induction. Cell pellets were resuspended in 600 /il 
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zinc buffer containing 0.5 mM phenylmethylsuifonyl fluoride (PMSF). Cells were lysed 
with 6-freeze-thaw cycles and the supernatant was clarified by centrifugation at 12,000g 
for 5 minutes. A 50 ^I-aJiquot of cell supernatant was applied to a microtiter well 
precoated with 1.1 of DNA-BSA conjugate. After 1 hour at 37°C, the plate was 
5 washed 10 times with distilled water, and an alkaline phosphatase conjugated anti- 
decapeptide antibody was added to the plate. After 30 minutes at 37 °C, the plate was 
washed 10 times and p-nitrophenylphosphate was added. The plate was then monitored 
with a microplate autoreader at 405 nm. 

Overexpression and Purification of Zinc-Finger Proteins 

1 0 Zinc finger proteins were overproduced by using the pET expression system (Studier, et 
ai, Methods Enzymol, 185:60-89, 1990). The Zif268 gene was introduced following 
PCR into Ndel and BamWl digested vector pET3a. Subsequently, the Zif268 gene was 
replaced with a 680-bp nonfunctional stuffer fragment. The resulting pET plasmid 
containing the stuffer fragment was used for cloning other zinc -finger genes by replacing 

1 5 the stuffer with zinc-finger genes using Spel and Xhol sites. The pET plasmids encoding 
zinc-finger genes were introduced into BL21(DE3)pLysS by chemical transformation. 
Cells were grown to mid-log phase (00^0.4-0. 6) in SB medium containing 50 jug/ml 
carbenicillin and 30 Mg/ml chloramphenicol. Protein expression was induced by addition 
of 0.7 mM IPTG to the medium. Typically, 500-ml cultures were harvested three hours 

20 after induction. Cell pellets were rcsuspended in the zinc buffer containing 1 mM PMSF 
and cell were lysed by sonication for 5 minutes at 0°C. Following addition of 6mM 
MgCl : , cell lysate were incubated with 10 ug/ml DNase I for 20 minutes on ice. 
Inclusion bodies containing zinc finger protein were collected by centrifugation at 
25,000g for 30 minutes and were resuspended and solubilized in 10 ml Zinc buffer 

25 containing 6M urea and 0.5 mM PMSF with gentle mixing for 3 to 12 hours at 4~C. The 
extract was clarified by centrifugation at 30,000g for 30 minutes and filtered through a 
0.2-um low protein binding filter. Total protein extract was applied to a Hepann- 
Sepharose FPLC column (1.6 x 4.5 cm) equilibrated with zinc buffer. Proteins were 
eluted with a 0-0.7 M NaCl gradient. Fractions containing zinc-finger Drotein were 
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identified by SDS-PAGE and pooled. Protein concentration was determined by the 
Bradford method using BSA (fraction V) as a standard (Bradford, Anal. Biochem., 
22 248-254, 1976). The yield of purified protein was from 7 to !9 mg/Iiter of cell 
culture. Protein was over 90% homogeneous as judged by SDS-PAGE. 

5 Kinetic Analysis 

The kinetic constants for the interactions between Zif268 peptides and their DN A targets 
were determined by surface plasmon resonance based analysis using the BLAcore 
instrument (Pharmacia) (Malmqvist, Curr. Opinion in Immuno., 5:282-286, 1993). The 
surface of a sensor chip was activated with a mixture of EDCI and NHS for 1 5 minutes. 

0 Then 40 u\ of affinity purified streptavidin (Pierce), 200 Mg/ml in lOmM sodium acetate 
(pH4.5), was injected at a rate of 5 Ml/nun. Typically, 5000-6000 resonance units of 
streptavidin were immobilized on the chip. Excess ester groups were quenched with 30 
m1 of 1M ethanolamme. Oligonucleotides were immobilized onto the chip by injection 
of 40 ul of biotinylated oligonucleotides (50 Mg/ml) in 0.3 M of sodium chloride. 

5 Usually 1500-3000 resonance units of oligomers were immobilized. The association rate 
(k on ) was determined by studying the rate of binding of the protein to the surface at 5 
different protein concentrations ranging from 10 to 200 Mg/ml in the zinc buffer. The 
dissociation rate (k ofT ) was determined by increasing flow rate to 20 ul/min after 
association phase. The * off value is the average of three measurements. The and 

0 value were calculated using Biacore® kinetics evaluation software. The equilibrium 
dissociation constants were deduced from the rate constants. 

EXAMPLE 9 

PHAGEMID DISPLAY OF MODIFIED ZINC FINGERS 

Library Design and Selection 

5 Phage display of the Zif268 protein was achieved by modification of the phagemid 
display system pComb3 as described in Examples 2-6. The Zi£268 sequence from pzif89 
was tailorec ry PCR for insertion between the Xho\ and Spel sites of pComb3.5. As 
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described above in Example 4. insertion at these sites results in the fusion of Zif268 with 
the carboxyl terminal segment of the filamentous phage coat protein III, pill, gene. A 
single panning experiment which consists of incubating the phage displaying the zinc 
finger protein with the target DNA sequence immobilized on a microtiter well followed 
5 by washing, elution, and titering of eluted phage was utilized to examine the functional 
properties of the protein displayed on the phage surface. 

In control experiments, phage displaying Zif268 were examined in a panning expenment 
to bind a target sequence bearing its consensus binding silc or the binding sue of the first 
three fingers of TFITIA. These experiments showed that Zif268 displaying phage bound 

10 the appropriate target DNA sequence 9-fold over the TFII1A sequence or BSA and 
demonstrated that sequence specific binding of the finger complex is maintained during 
phage display. A 4-fold reduction in phage binding was noted when Zn* 2 and DTT were 
not included in the binding buffer. Two reports verify' that Zif268 can be displayed on 
the phage surface (Rebar, et al t Science, 263:671-673. 1994; Jamieson. etai, Biochem., 

15 33:5689-5695, 1994). 

In a similar experiment, the first three fingers of TFIHA were displayed on the surface 
of phage and also shown to retain specific binding activity. Immobilization of DNA was 
facilitated by the design of stable hairpin sequence which present the duplex DNA target 
of the fingers within a single oligonucleotide which was amino labeled (FIGURE 8B) 
20 (Antao, et al, Nucleic Acids Research, 19:5901-5905, 1991). The hairpin DNA 
containing the 9-bp consensus binding site (5'-GCGTGGGCG-3\ as enclosed) of wild- 
type Zif268 was used for affinity selection of phage-displayed zinc finger proteins. In 
addition, the 3-bp subsites (boxed) of consensus HIV-1 DNA sequence were substituted 
for wild-type Zif268 3-bp subsites for affinity selection. 

25 The amino linker allowed for covalent coupling of the hairpin sequence to acetyiated 
BSA which was then immobilized for selection experiments by adsorption to polystyrene 
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microtiter wells. Biotinylated hairpin sequences worked equally well for selection 
following immobilization to streptavidin coated plate. 

Libraries of each of the three fingers of Zi£268 were independently constructed using the 
previously described overlap PGR mutagenesis strategy (Barbas IE, et al. Proc. Natl 
5 Acad. Sci USA, 89:4457-4461, 1992 and EXAMPLES 2-6). Randomization was limited 
to six positions due to constraints in the size of libraries which can be routinely 
constructed (Barbas III, Curr. Opinion in Biotech, 4:526-530, 1 993). Zinc finger protein 
recognition of DNA involves an antiparallel arrangement of protein in the major groove 
of DNA, i.e., the amino terminal region in involved in 3' contacts with the target 

10 sequence whereas the carboxyl terminal region is involved in 5' contacts (FIGURE 8B). 
Within a given finger/DNA subsite complex, contacts remain antiparallel where in finger 
1 of Zif268, guanidinium groups of Arg at helix positions -1 and 6 hydrogen bond with 
the 3' and 5' guanines, respectively of the GCG target sequence. Contact with the central 
base in a triplet subsite sequence by the side chain of the helix position 3 residue is 

1 5 observed in finger 2 of Zif268, fingers 4 and 5 of GLI, and fingers 1 and 2 of TTK. 
Within the three reported crystal structures of zinc-finger/DNA complexes direct base 
contact has been observed between the side-chains of residues -1 to 6 with the exception 
of 4 (Pavletich, supra; Pavletich, Science. 261:1701-1707. 1993; Fairall, etal. Nature, 
366:483-487, 1993). 

20 Based on these observations, residues corresponding to the helix positions -1, 2, 3, 4, 5, 
and 6 were randomized in the finger 1 and 3 libraries. The Ser of position 1 was 
conserved in these experiments since it is well conserved at this position in zinc finger 
sequences in general and completely conserved in Zif268 (Jacobs, EMBO J., J_l:4507- 
4517, 1992). In the ringer 2 library, helix positions -2, -1, 1, 2, 3, and 4 were randomized 

25 to explore a different mutagenesis strategy where the -2 position is examined since both 
Zif268 and GLi structures reveal this position to be involved in phosphate contacts and 
since it will have a context effect on the rest of the domain. Residues 5 and 6 were fixed 
since the target sequence TTG retained the 5' thymidine of the wild type TGG site. 
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Introduction of ligated DNA by electroporation resulted in the construction of libraries 
consisting of 2 x 10 9 , 6 x 10', and 7 x 10* independent transformants for finger libraries 
1 , 2, and 3, respectively. Each library results in the display of the mutagenized finger in 
the context of the two remaining fingers of wild-type sequences. 

5 EXAMPLE 10 

SEQUENCE ANALYSIS OF SELECTED FINGERS 

In order to examine the potential of modifying Zinc-fingers lo bind defined targets and 
to examine their potential in gene therapy, a conserved sequence within the HTV-1 
genome was chosen as a target sequence. The 5' leader sequence of HIV- 1 HXB2 clone 
10 at positions 106 to 121 relative to the transcriptional initiation start site represents one 
of several conserved regions within HTV-1 genomes (Yu, et a!., Proc Natl. Acad. ScL 
LS/L 90:6340-6344, 1993); Myers, et aL 1992). For these experiments, the 9 base pair 
region, 1 13 to 121. shown in FIGURE 8B, was targeted. 

Following selection for binding the native consensus or HIV-1 target sequences, 
15 functional zinc fingers were rapidly identified with an immuno screening assay. 
Expression of the selected proteins in a pAraHA derivative resulted in the fusion of the 
mutant Zif268 proteins with a peptide tag sequence recognized by a monoclonal antibody 
(FIGURE 8A). Binding was determined in an ELISA format using crude cell lysates. 
A qualitative assessment of specificity can also be achieved with this methodology which 
20 is sensitive to at least 4-fold differences in affinity. Several positive clones from each 
selection were sequenced and are shown in FIGURE 9. The six randomized residues of 
finger 1 and 3 are at positions - I. 2, 3, 4, 5, and 6 in the a-helical region, and at -2.-1. 
1, 2, 3, and 4 in finger 2 (71GURE 9). The three nucleotides denote the binding sue used 
for affinity selection of each finger. Proteins studied in detail are indicated with a clone 
25 designation. 
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Finger 1 selection with the consensus binding site GCG revealed a strong selection for 
Lys at position -1 and Arg at position 6. Covariation between positions -1 and 2 is 
observed m three clones which contain Lys and Cys at these positions respectively. 
Clone C7 was preferentially enriched in the selection based on its occurrence in 3 of the 
5 12 clones sequenced. Selection against the HTV-1 target sequence in this region, TGT, 
revealed a diversity of sequences with a selection for residues with hydrogen-bonding 
side chains in position -1 and a modest selection for Gin at position 3. Finger 2 selection 
against the consensus TGG subsite showed a selection for an aromatic residue at -1 
whereas selection against the HIV-1 target TTG demonstrated a selection for a basic 

1 0 residue at this position. The preference for Ser at position 3 may be relevant in the 
recognition of thymidine. Contact of mymine with Ser has been observed in the GLI and 
TTK structures (Pavletich, supra; Fairall, et al, supra). Other modest selections towards 
consensus residues can be observed within the table. Selections were performed utilizing 
a supE strain of E. coli which resulted in the reading of the amber codon TAG as a Gin 

15 during translation. Of the 51 sequences presented in FIGURE 9, 14 clones possessed a 
single amber codon. No clones possessed more than one amber codon. Selection for 
suppression of the amber stop codon in supE stains has been noted in other DNA binding 
protein libraries and likely improves the quality of the library since this residue is 
frequently used as a contact residue in DNA binding proteins (Huang, et al, Proc. Natl 

20 Acad. Sci. USA, 91:3969-3973, 1994). Selection for fingers containing free cysteines is 
also noted and likely reflects the experimental protocol. Phage were incubated in a 
buffer containing ZrC 1 and DTT to maximize the number of phage bearing properly 
folded fingers. Selection against free cysteines, presumably due to aggregation or 
improper folding, has been noted previously in phage display libraries of other proteins 

25 (Lowman, et al, J. Mol Biol, 224:564-578, 1993). 

For further characterization, high level expression of zinc finger proteins was achieved 
using the T7 promoter (FIGURE 10) (Studier, et al, supra). In FIGURE 10, proteins 
were separated by 15% SDS-PAGE and stained with Coomassie brilliant blue. Lane 1 : 
molecular weight standards (kDa). Lane 2: cell extract before IPTG induction. Lane 3: 
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cell extract after IPTG induction. Lane 4: cytoplasmic fraction after removal of inclusion 
bodies by centrifugation. Lane 5: inclusion bodies containing zinc finger peptide. Lane 
6: mutant ZiG68 peptide purified by HeparuvSepharose FPLC. Clones CIO, F8, and G3 
each possessed an amber codon which was converted to CAG to encode for Gin prior to 
5 expression in this system. 



EXAMPLE 11 

CHARACTERIZATION OF AFFINITY AND SPECIFICITY 

In order to gain insight into the mechanism of altered specificity or affinity, the kinetics 
of binding was determined using real-time changes in surface plasmon resonance (SPR) 

0 (Malmqvist, supra). The kinetic constants and calculated equilibrium dissociation 
constants of 1 1 proteins are shown in FIGURE 1 1 . Each zinc finger protein studied is 
indicated by a clone designation (for its sequence, see FIGURE 9). The target DNA site 
used for selection of each finger is indicated in bold face. The consensus binding site for 
the wild type protein is also shown in bold. The non-hairpin duplex DNA (underlined) 

5 was prepared by annealing two single-stranded DNAs. The Jfc 0B , association rate; £ ff , 
dissociation rate; K^, equilibrium dissociation constant for each protein is given. 

The calculated equilibrium dissociation constants for Zif268 binding to its consensus 
sequence in the form of the designed hairpin or a linear duplex lacking the tetrathymidine 
loop are virtually identical suggesting that the conformation of the duplex sequence 
0 recognized by the protein is not perturbed in conformation within the hairpin. The value 
of 6.5 nM for Zif268 binding to its consensus is in the range of 0.5 to 6 nM reported 
using electrophoretic mobility shift assays for this protein binding to its consensus 
sequence within oligonucleotides of different length and sequence (Pavletich, supra, 
Rebar, supra: Jamieson, et ai, supra). 

5 As a measure of specificity, the affinity of each protein was determined for binding to 
the native consensus sequence and a mutant sequence in which one finger subsite had 
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been changed. FIGURE 1 1 shows the determination of dissociation rate (k otT ) of wild- 
type Zif268 protein (WT) and its variant C7 by real-time changes in surface piasmon 
resonance. The response of the instrument, r, is proportional to [protein-DNA] complex. 
Since dr/dt = * ofl r when [protein] = 0, then Jt off = 1 n (r tI /r tn )/(t ll .t I ), where r^ is the response 
5 at time t n . The results of a single experiment for each protein are shown. Three 
experiments were performed to produce the values shown in FIGURE 1 1 . Clone C7 is 
improved 13-fold in affinity for binding the wild-type sequence GCG. The major 
contribution to this improvement in affinity is a 5-fold slowing of the dissociation rate 
of the complex (FIGURE 12). Specificity of the C7 protein is also improved 9-fold with 

10 respect to the HIV-1 target sequence. This result suggest that additional or improved 
contacts are made in the complex. Studies of protein C9 demonstrate a different 
mechanism of improved specificity. In this case the overall affinity of C9 for the GCG 
site is equivalent to Zif268 but the specificity is improved 3-fold over Zif268 for binding 
to the TGT target site by an increase in the off-rate of this complex. Characterization of 

1 5 proteins F8 and F 1 5 demonstrate that the 3 base pair recognition subsite of finger 1 can 
be completely changed to TGT and that new fingers can be selected to bind this site. 

Characterization of proteins modified in the finger 2 domain and selected to bind the 
TTG subsite reveal the specificity of this finger is amenable to modification. Proteins 
G4 and G6 bind an oligonucleotide bearing the new subsite with affinities equivalent to 

20 Zif268 binding its consensus target- Specificity of these proteins for the target on which 
they were selected to bind is demonstrated by an approximately 4-fold better affinity for 
this oligonucleotide as compared to the native binding site which differs by a single base 
pair. This level of discrimination is similar to that reported for a finger 1 mutant 
(Jamieson, et al, supra). The finger 3 modified protein A14 was selected to bind the 

25 native finger 3 subsite and binds this site with an affinity which is only 2-fold lower than 
Zif268. Note that protein A14 differs radically in sequence from the native protein in the 
recognition subsite. Sequence specificity in 10 of the 11 proteins characterized was 
provided by differences in the stability of the complex. Only a single protein, G6, 



« 



WO 98/54311 



PCT/LS98/10801 



-87- 

achieved specificity by a dramatic change in on-rate. Examination of on-rate variation 
with charge variation of the protein did not reveal a correlation. 

EXAMPLE 12 
DIMERIC ZINC FINGER CONSTRUCTION 

5 Zinc finger proteins of the invention can be manipulated to recognize and bind to 
extended target sequences. For example, zinc finger proteins containing from about 2 
to 12 zinc fingers Zif(2) to Zif(12) mav be fused to the leucine 7tnnf»r Hnmainc r*f i-Ka 
Juntos proteins, prototypical members of the bZIP family of proteins (O'Shea, et al. 
Science, 2^4:539, 1991). Alternatively, zinc finger proteins can be fused to other 
1 0 proteins which are capable of forming heterodimers and contain dimenzation domains. 
Such proteins will be known to those of skill in the art. 

The Jun/Fos leucine zippers preferentially form heterodimers and allow for the 
recognition of 12 to 72 base pairs. Henceforth, Jun/Fos refer to the leucine zipper 
domains of these proteins. Zinc finger proteins are fused to Jun, and independently to 

15 Fos by methods commonly used in the art to link proteins. Following purification, the 
Zif-Jun and Zif-Fos constructs (FIGURE 13 and 14, respectively), the proteins are mixed 
to spontaneously form a Zif-Jun/Zif-Fos heterodimer. Alternatively, coexpression of the 
genes encoding these proteins results in the formation of Zif-Jun/Zif-Fos heterodimers 
in vivo. Fusion with an N-terminal nuclear localization signal allows for targeting of 

20 expression to the nucleus ( Calderon, et al Cell, 4]_:499, 1 982). Activation domains may 
also be incorporated into one or each of the leucine zipper fusion constructs to produce 
activators of transcription (Sadowski, et al.. Gene. ]J_8:137. 1092). These dimenc 
constructs then allow for specific activation or repression of transcription. These 
heterodimenc Zif constructs are advantageous since they allow for recognition of 

25 palindromic sequences (if the fingers on both Jun and Fos recognize the same DN A/RN A 
sequence) or extended asymmetric sequences (if the fingers on Jun and Fos recognize 
different DNA/RNA sequences). For example the palindromic sequence 
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5' - GGC CCA CGC , , N GCG TGG GCG - 3* 

3' - GCG GGT GCG InI x CGC ACC CGC - 5' (SEQ ID NO: 37) 



is recognized by the Zif268-Fos/Zif268 Jun dimer (x is any number). The spacing 
bet ween subsites is determined by the site of fusion of Zif with the Jun or Fos zipper 
5 domains and the length of the linker between the Zif and zipper domains. Subsite 
spacing is determined by a binding site selection method as is common to those skilled 
in the an (Thiesen, el ai. Nucleic Acids Research, 18:3203, 1990). Example of the 
recognition of an extended asymmetric sequence is shown by Zif(C7) 6 -Jun/Zif-268-Fos 
dimer. This protein consists of 6 fingers of the C7 type (EXAMPLE 11) linked to Jun 
1 0 and three fingers of Zif268 linked to Fos, and recognizes the extended sequence: 



5' - CGC CGC CGC CGC CGC CGC 




N GCG TGG GCG - 3' 



3' - GCG GCG GCG GCG GCG GCG lNl x CGC ACC CGC - 5' 
(SEQ ID NO: 38) 



t 
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EXAMPLE 13 

CONSTRUCTION OF MULTT FINGER PROTEINS UTILIZING REPEATS 

OF THE FIRST FINGER OF ZIF268 

Following mutagenesis and selection of variants of the Zif268 protein in which the finger 
5 1 specificity or affinity was modified (See EXAMPLE 7), proteins carrying multiple 
copies of the fmger may be constructed using the TGEKP linker sequence by methods 
known in the art. For example, the C7 finger may be constructed according to the 
scheme: 

MKLLEPYACPVESCDRRFSKSADLKRHIRHIQEKP- 

1 0 (Y ACPVESCDRRPSKS ADLKHIRIHTGEKP) , , where the sequence of the last linker 
is subject to change since it is at the terminus and not involved in linking two fingers 
together. An example of a three finger C7 construction is shown in Figure 15. This 
protein binds the designed target sequence GCG-GCG-GCG (SEQ ID NO: 32) in the 
oligonucleotide hairpin CCT-CGC-CGC-CGC-GGG-TTT-TCC-CGC-GCC-CCC GAG 

1 5 G with an affinity of 9nM, as compared to an affinity of 300 nM for an oligonucleotide 
encoding the GCG-TGG-GCG sequence (as determined by surface plasmon resonance 
studies). Proteins containing 2 to 12 copies of the C7 finger have been constructed and 
shown to have specificity for their predicted targets as determined by ELISA (see for 
example. Example 7). Fingers utilized need not be identical and may be mixed and 

20 matched to produce proteins which recognize a desired target sequence. These may also 
be utilized with leucine zippers (e.g., Fos/Jun) to produce proteins with extended 
sequence recognition 

In addition to producing polymers of finger 1, the entire three finger Zif268 and modified 
versions therein may be fused using the consensus linker TGEKP to produce proteins 
25 with extended recognition sites. For example, FIGURE 16 shows the sequence of the 
protein Zif268-Zif268 in which the natural protein has been fused to itself using the 
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TGEKP linker. Thus protein now binds the sequence GCG-TGG-GCG-GCG-TGG-GCG 
as demonstrated by ELISA. Therefore modifications within the three fingers of Zif268 



new zinc proteins may also be used in combination with leucine zippers if desired, as 
5 described in Example 12. 



Coordinates for the Zif268-DNA complex were obtained from the Brookhaven Protein 
Data Bank. iModel building was done with INSIGHTH (Biosym Technologies, San 

1 0 Diego, California). A continuous 20 bp double -stranded DNA molecule with a six-finger 
binding site (18 bp) was built from the coordinates for the DNA strands in the Zif268 
complex (Pavletich, N.P. & Pabo, CO. (1991) Science 252. 809-817). Two molecules 
of the three-finger protein were re-introduced onto each 9 bp half-site, by overlapping 
the Z11268-DNA complex onto the modeled DNA. It w r as apparent that the linker length 

1 5 required to connect the F3 a-helix to the first P-strand of F4 was compatible with the 
length of the natural linker peptides, TGQKP and TGEKP. Hence, a peptide linker, 
TGEKP, was constructed between F3 and F4 after trimming off the extra residues from 
the C- and N- termini of the F3 and F4 respectively. The linker was built so as to 
maintain the positioning and hydrogen bond characteristics observed in the two natural 

20 linker regions of Zif268. 

In order to explore the possibility of connecting two three-finger protein molecules with 
a linker peptide, computer modeling studies were performed based on the structure of the 
three zinc finger Zif268-DNA complex supra. A six-finger-DNA complex, modeled by 
connecting finger 3 (F3) of Zif268 to finger 1 of a second Zi£268 molecule (hence forth 
25 designated finger 4; F4), would help determine the length and sequence of a compatible 
linker peptide to be used in the construction of six-finger proteins. Study of the model 
suggested that it should be possible to produce a six-finger protein with a Thr-Gly-Glu- 




nded sequences. These 



EXAMPLE 14 



DESIGN OF A LINKER PEPTIDE 
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Lys-Pro (TGEKP) pentapeptide linker between F3 and F4 and that this polydactyl protein 
would most likely bind DNA containing the 18-nucleotide site 5'- 
GCGTGGGCGGCGTGGGCG-3\ This pentapeptide constitutes the consensus peptide 
most commonly found linking zinc finger domains in natural proteins. Prior to 
5 construction of the model, the consensus peptide TGEKP was considered insufficient to 
keep the periodicity of the zinc finger domains in concert with that of the DNA over this 
extended sequence since no natural zinc finger proteins have been demonstrated to bind 
DNA with more than three contiguous zinc finger domains, even though natural proteins 
containing more than three zinc finger domains are quite common. Comparative studies 

1 0 of the constructed TGEKP linker with the natural linkers observed in the Zif268 structure 
indicated that this linker is as optimal a linker peptide as any novel linker sequence that 
could be designed- In binding this extended site, the modeled six-fingered protein 
follows the major groove of DNA for approximately two turns of the helix. Such 
extended contiguous binding within the major groove of DNA has not been observed 

1 5 with any known DNA-binding protein. 

Plasmid Construction. The six-finger protein, C7-C7, was constructed by linking two 
C7 proteins with the TGEKP linker peptide. Two C7 DNA fragments were created by 
Polymerase chain reaction (PCR) with two different sets of primers using pET3a-C7 as 
template (Wu, H., Yang, W.P. & Barbas, C.F.I. (1995) Proc. Nail. Acad Sc/.USA 92. 
20 344-348), so the 5' C7 was flanked by Xhol and CftiOl sites at the 5' and 3' ends 
respectively, and the 3* C7 was flanked by CfrlOl and Spel sites. The primer pairs for 
the generation of the 5' C7 are: 5'- 

GAGGAGGAGGAGGGATCCATGCTCGAGCTCCCCTATGCTTGCCCTG-3', and 
5'-GAGGAGGAGACCGGTATGGATTTTGGTATGCCTCTTGCG-3'; and for the 3' C7 
25 are 5'- 

GAGGAGGAGACCGGTGAGAAGCCCTATGCTTGCCCTGTCGAGTCCTGCGA 
TCGCCGC-3', and 5'-GAGGAGGAGACTAGTTCTAGAGTCCTTCTGTC-3'. Then 
these two C7 DNA fragments were ligated into a pGEX-2T (Pharmacia) vector which 
has been modified with Xhol and Spel sites introduced between the pGEX-2T cloning 
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sites BamHI and EcoRJ. The CfrlOl enzyme site between the two C7 fragments encodes 
amino acids TG, pan of the TGEKP linker peptide. The fidelity of the C7-C7 sequence 
was determined by DNA sequencing. The C7-C7 DNA fragment was then cut out from 
the pGEX-2T construct with Xhol/Spel and cloned into a modified pMal-c2 (New 
5 England Biolabs) bactenaJ expression vector for the expression of C7-C7 maltose fusion 
proteins. For transfection experiments, the C7-C7 DNA fragment was removed via 
BamHI/EcoRI excision and ligated into the corresponding sites of pcDNA3, a eukaryotic 
expression vector (lnvitrogen. San Diego, CA). Like the generation of CI -CI protein, the 
SplC-C7 protein was created by linking the PCR products of SplC (17) and C7 which 
10 were Hanked with XhoI/CfHOl, and CfrlOl/Spel respectively. Then the SplC-C7 
fragments was ligated into the pcDNA3 eukaryotic expression vector or into the pMal-c2 
bacterial expression vector. The DNA sequence of the SplC-C7 protein was confirmed 
by DNA sequencing. 

For reporter gene assays of activation, the reporter genes were constructed by inserting 
15 six forward tandem repeats of the individual binding sites into the Nhel site at the 
upstream of the SV40 promoter of pGL3-promoter (Promega). In the reporter gene 
assays for repression, six forward tandem copies of the CI -CI binding sites were placed 
upstream of the SV40 promoter at the Nhel site of pGL3 -control (Promega). 

Expression and Purification of Zinc-Finger Proteins. Proteins were overexpressed as 
20 fusions with the maltose binding protein using the Maltose fusion and purification system 
(New England Biolabs). The maltose fusion proteins were purified by using amylose 
resin filled affinity column according to the manufacturer's instructions. Fusion proteins 
were determined to be greater than 90% homogeneous as demonstrated by Coomassie 
blue stained SDS/PAGE gels. Protein concentrations were determined by amino acid 
25 analysis. 

Gel Mobility Shift Assays. To produce probes used in the gel mobility shift assay, 
double -stranded oligonucleotides containing TCGA overhangs at the 5' end of each 
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strand were labeled with a 32 P-dATP. The sequences of the primary strands within the 
duplex regions were 5-GATGTATGTAGCGTGGGCGGCGTGGGCGTAAGTAATGC- 
3* (C7-C7 site), S'-GATGTATGTAGCGTGGGCGGGGGCGGGGTAAGTAATGCO' 
(SP1C-C7 site), S'-GATGTATGTAGCGGCGGCGGCGGCGGCGTAAGTAATGC^' 
5 {(GCG) ft site}, 5-GATGTATGTAGCGTGGGCGTAAGTAATGC-3' (C7 site), and 5- 
GATGTATGTAGGGGCGGGGTAAGTAATGC-3' (SplC site). For each binding 
reaction, 1.2 ug of poly(dl-dC), 30 pM of labeled oligo was incubated with the C7-C7 
maltose fusion protein (MBP-C7-C7) or SplC-C7 maltose fusion protein (MBP-SplC- 
C7) in 20 ul of ] x Binding Buffer (! 0 mM Tris-Cl, pH 7.5, 1 00 mM KCI, 1 mM MgCl : , 

10 1 mM DTT, 0.1 mM ZnCI 2 , 10% glycerol, 0.02% NP^O, 0.02% BSA) for 30 minutes 
at room temperature. The reaction mixtures were then run on a 5% nondenaturing 
polyacryiamide gel with 0.5 x TBE buffer at room temperature. The radioactive signals 
were quantitated with a Phosphorlmager (Molecular Dynamics) and recorded on X-ray 
films. The data were then fit using the KaleidaGraph program (Synergy Software, 

1 5 Reading, PA) to give the equilibrium dissociation constants. 

DNasel Footprinting Analysis. DNasel footprinting was performed using the SureTrack 
Footpnnting Kit (Pharmacia) according to the manufacturer's instructions. Two 220 bp 
DNA fragments contain single C7-C7 and SplC-C7 binding sites were synthesized by 
PCR fusion reactions, and then cloned into pcDNA3 vector. Two sets of primers: 1) 

20 EcoRIfootF, 5 , -GAGGAGGAGGAATTCCGACATTTATAATG AACGTG AATTGC-3 \ 
and 

C7-C73>5, 5'- 

TGCGCCCACGCCGCCCACGCGATGATTGGGAGCTTTTTTTGCACG-3'; and 
2)C7-C75>3, 5- 

25 TCGCGTGGGCGGCGTGGGCGCAAAAAATTATTATCATGGATTC 
TAAAACGG-3', and NotlfootB, 

5'-GAGGAGGAGGCGGCCGCAGGTAGATGAGATGTGACGAACGTG-3' were 
used with pGL3-promoter (Invitrogen, CA) as template to generate the two 
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overlapping subfragments of the C7-C7 footprinting probe. Then the two PCR 
products were used as template with EcoRlfootF and NotlfootB as primers to 
generate the 220 bp C7-C7 footprinting probe. The footprinting probe containing the 
SplC-C7 binding site was constructed the same way as the C7-C7 probe, except the 
5 oiigosSplC-C73>5, 5- 

TGCCCCGCCCCCGCCCACGCGATGATTGGGAGCTTTTTTTGCACG-3', and 
SplC-C75>3, 

TCGCGTGGGCGGGGGCGGGGCAAAAAATTATTATCATGGATTCTAA.AACG 
G-3' were used here to replace the C7-C73>5 and C7-C75>3 oligos. pcDNA3 vectors 

10 containing the binding sites for C7-C7 or SplC-C7 were then digested with EcoRI 
and Not! The 220 bp fragments were gel purified and end-labeled using Klenow 
polymerase and 32 P-dATP. Because there are no thymines in the Not I site, only the 
strand extended at the EcoRJ site is radiolabeled. Approximately 2.3 xl04 cpm (0.1 
pM) was then used in a 50 ul binding reaction containing 20 ug/ml of either BSA or 

15 purified binding protein (300 nM) in Ix Binding Buffer (1 0 mM Tris-CL pH 7.5, 100 
mM KC1, 1 mM MgCU, 1 mM DTT, 0.1 mM ZnCl 2 , 10% glycerol, 0.02% NP-40, 
0.02% BSA) and 60 ug/ml poly(dl-dC) DNA. Optimal binding conditions were 
determined from gel shift assays. This reaction was incubated for 30 minutes at room 
temperature prior to the addition of 1 U DNasel. 

20 Luciferase Reporter Gene Assays. For the reporter gene assay experiments, 2.5 ug 
of the individual reporter DNA and 2.5 ug of the C7-C7-VP16 expression plasmids 
were transfected by calcium phosphate method (Brasier, A.R., Tate, J.E. & Habener, 
J.F. (1989) BioTechniques 7, 1 1 16-1 122) into HeLa cells which were passed the day 
before at 3 x 10 5 /per well of the six well culture plate. Eighteen hours later, the cells 

25 were washed and replenished with Dulbecco's Modified Eagle's Medium containing 
10% newborn calf serum (Gibco-BRL). Two days later, the cells were washed, lysed, 
and measured for luciferase activity using Wallac's 96 well LB96 luminometer with 
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the luciferase assay system (Promega). The internal P-Galactosidase activity control 
was measured by using a p-Galactosidase reporter gene assay system (Tropix. MA). 

Characterization of Affinity and Specificity of Two Six-Finger Proteins. To test 
our model we constructed two six-finger proteins. In the first protein designated C7- 
5 C7, two copies of C7, a phage display selected Zif268 variant (supra), were linked 
together via the TGEKP peptide. A second six-finger protein, SplC-C7, combines a 
designed variant of the three-finger Spl transcription factor, SplC (Shi, Y. & Berg, 

JM (WWlChem Bin! 1 RVJW with fhp thrpp-ftnapr P7 Thp P7 <Jntf Cl-Cl 

, - / , . _ - O ~" " ' ' , — f - ^— * w • , , 

and SplC-C7 proteins were overexpressed in Esherichia coli as fusions with maitose- 
10 binding protein (MBP) and purified. The affinities and specificities of these proteins 
were determined by electrophoretic mobility shift assays (FIGURE 17). The results of 
these studies are given in Table 2. 

The six-finger proteins C7-C7 and SplC-C7 bind their 18 bp target sequences, 5'- 
GCGTGGGCGGCGTGGGCG-3' and 5'-GCGTGGGCGGGGGCGGGG-3\ 

1 5 respectively, with 68- to 74-fold enhanced affinity relative to the three-finger C7 or 
SplC proteins. To examine the specificity of the CI -CI protein we studied its 
binding to probes containing 4 bp differences in one half-site (SplC-C7 probe; 5-G- 
CGTGGGCGGGGGCGGGG-3') and 2 bp differences in each of the finger 2 and 5 
binding sites ((GCG) 6 probe; 5'-GCGGCGGCGGCGGCGGCG-3 , l These studies 

20 revealed a preference for the designed target probe of 5-fold relative to the Spl C-C7 
probe and 37-fold preference over the (GCG) 6 probe. This together with binding 
studies using a probe containing the 9 bp C7 half-site, 5'-GCGTGGGCG-3' 
demonstrates that mutations spread across the binding site are more disruptive to 
binding than ones which occur at one end of the binding site. This behavior is 

25 expected of polydactyl proteins because mutations within a given finger binding site 
should effect the ability of both neighbor fingers to obtain their optimal mode of 
binding. Similar results were obtained for the SplC-C7 protein (Table 2). To further 
examine the binding of the C7-C7 and SplC-C7 proteins, DNasel footpnntmg assays 
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were performed (FIGURE 18). These studies demonstrated that both MBP fusions 
protected DNA binding sites slightly greater than the 18 bp site which is bound 

<;pniIPn(*P ^npriflfullv T"V»ic ic r>ct liL^lv rln#» tr\ ctari r- kl r»r- b-i V\tt tKo KfOD fiirinn ot 
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the N-terminus of the protein and a decapeptide epitope tag at the C-terminus of the 
5 protein. 

Trancriptional Activation and Repression. To examine the specificity of the six- 
finger proteins in living cells, we constructed eukaryotic expression vectors which 
fuse the CI -CI and SplC-C7 proteins to the nuclear localization signal from the 
SV40 large T antigen ( Pro-Lys-Lys-Arg-Lys-Val) (Kalderon, D., Roberts, B.L., 

1 0 Richardson, WD. & Smith, A.E. ( 1 984) Cell 39, 499-509) and the transcriptional 
activation domain from the herpes simplex virus VP 16 protein. These plasmids were 
cotransfected into the human HeLa cell line with reporter plasmids expressing the 
firefly luciferase gene under control of the SV40 promoter (pGL3 -promoter). The 
reporter plasmids were constructed with C7-C7, SplC-C7, C7, and (GCG) 6 binding 

1 5 sites placed upstream of the SV40 promoter. The results of these studies with the C7- 
C7 protein are given in (FIGURE 19). Both G7-C7 and SplC-C7 stimulated the 
activity of the promoter in a dose-dependent fashion. In the C7-C7 case, a > 300-fold 
stimulation of expression above background was observed for plasmids containing the 
C7-C7 binding site, while a similar concentration of protein stimulated expression of 

20 plasmids containing the C7 and SplC-C7 only about 3-fold. The in vivo specificity of 
this protein, indicated by an approximately 100-fold activation of the reporter plasmid 
bearing the proper binding site over plasmid containing a variant of the binding site, 
exceeds that determined in the in vitro binding assays described in Table 1 by 
approximately 5- to 1 0-fold. This enhanced specificity may be due to interactions 

25 generated by the maltose binding protein at the N-terminal of the C7-C7 fusion 
protein which was used in the in vitro binding assays. Difficulty in producing the 
purified protein in a fully folded natural state may also contribute to the reduced 
specificity in the in vitro assays. 
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Although the invention has been described with reference to the presently preferred 
embodiment, it should be understood that various modifications can be made without 
departing from the spirit of the invention. Accordingly, the invention is limited only 
by the following claims. 
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SEQUENCE LISTING 



(I) GENERAL INFORMATION: 

(i) APPLICANT: The Scripps Research 

(11) TITLE OF INVENTION: ZINC FINGER PROTEIN DERIVATIVES AND 
METHODS THEREFOR 

(Hi) NUMBER OF SEQUENCES: 6 2 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Fish & Richardson P.C. 

(B) STREET: 4225 Executive Square, Suite 1400 

(C) CITY: La Jolla 

(D) STATE: CA 

( E ) COUNTRY : USA 

(F) 2IP: 92037 

(vl COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC -DOS /MS -DOS 

(D) SOFTWARE : Patentln Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

iA) APPLICATION NUMBER: PCT/US98 / 

(B) FILING DATE: 27-MAY-1998 

(C) CLASSIFICATION: 

(vn> PRIOR APPLICATION DATA: 

■A) APPLICATION NUMBER: US 08/863,813 
(3) FILING DATE: 27-MAY-1997 
(C) CLASSIFICATION: 

(vin) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Haile, Lisa A., Ph.D. 

(B) REGISTRATION NUMBER: 38,347 

(C) REFERENCE /DOCKET NUMBER: 08401/010WO1 

fix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 619/678-5070 

(B) TELEFAX : 619/678-5099 



(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 amino acids 
(3) TYPE: amino acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY linear 

fu) MOLECULE TYPE- peptide 

(IX) FEATURE: 

(A) NAME / KEY : Peptide 

(B) LOCATION. 1 . .25 
(D) OTHER INFORMATION: 
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Xaa at residue l = Tvr or Phe ; Xaa at residue 
2,4,5, 7-9,11-15, 17-1B, 20-22 and 24-25 

= Ala, Arg, Asn, Asp, Cys, Glu, Gin, Gly, Kis, lie, Leu, Lys , Met, 
Phe, Pro, Ser, 

Thr, Trp, Tyr , or Val . Xaa at residue 5 = 1-3 ammo acids; Xaa at 
residue 22 = 1-2 

ammo acids; Xaa at residue 2 5 = 1-5 amino acids. 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 

Xaa Xaa Cys Xaa Xaa Cys Xaa Xaa Xaa Phe Xaa Xaa Xaa Xaa Xaa Leu 

15 10 15 

Xaa Xaa His Xaa Xaa Xaa His Xaa Xaa 

20 25 



(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vii) IMMEDIATE SOURCE: 
(B) CLONE: ZF 

dx) FEATURE: 

(A) NAME /KEY: CDS 
<B) LOCATION. 1. .36 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 
ATGAAACTGC TCGAGCCCTA TGCTTGCCCT GTCGAG 3 6 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

■ii) MOLECULE TYPE : DNA (genomic; 



(vii) IMMEDIATE SOURCE : 
(B) CLONE: ZR 



dx) FEATURE: 

(A) NAME / KEY : CDS 
(B! LOCATION: 1 . . 45 



WO 98/543 U PCT/US98/ 10801 



- 100 - 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 
GAGGAGGAGG AGACTAGTGT CCTTCTGTCT TAAATGGATT TTGGT 4 5 

(2) INFORMATION FOR SEQ ID NO : 4 : 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 273 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii> MOLECULE TYPE: DNA (genomic) 

(vn) IMMEDIATE SOURCE: 

(B) CLONE: zif 268Xho-Spe 

(IX) FEATURE: 

(A) NAME / KEY : CDS 

(B) LOCATION: 1 . .273 

(xii SEQUENCE DESCRIPTION; SEQ ID NO:4: 

CTC GAG CCC TAT GCT TGC CCT GTC GAG TCC TGC GAT CGC CGC TTT TCT 4 8 
Leu Glu Pro Tyr Ala Cvs Pro Val Glu Ser Cvs Asp Arg Arg Phe Ser 
1 5 10 15 

CGC TCG GAT GAG CTT ACC CGC CAT ATC CGC ATC CAC ACA GGC CAG AAG 96 
Arg Ser Asp Glu Leu Thr Arg His lie Arg lie His Thr Gly Gin Lvs 

20 25 30 

CCC TTC CAG TGT CGA ATA TGC ATG CGT AAC TTC AGT CGT AGT GAC CAC 14 4 
Pro Phe Gin Cys Arg lie Cys Met Arg Asn Phe Ser Arg Ser Asp His 
35 40 45 

CTT ACC ACC CAC ATC CGC ACC CAC ACA GGC GAG AAG CCT TTT GCC TGT 192 
Leu Thr Thr His He Arg Thr His Thr Gly Glu Lys Pro Phe Ala Cys 
50 55 60 

GAC ATT TGT GGG AGG AAG TTT GCC AGG AGT GAT GAA CGC AAG AGG CAT 24 0 
Asp He Cys Gly Arg Lys Phe Ala Arg Ser Asp Glu Arg Lys Arg His 
65 70 75 80 

ACC AAA ATC CAT TTA AGA CAG AAG GAC ACT AGT 273 
Thr Lys He His Leu Arg Gin Lys Asp Thr Ser 

85 90 

(2) INFORMATION FOR SEQ ID NC : 5 : 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 91 amino acids 

(B) TYPE : amino acid 
(D) TOPOLOGY: linear 

(n) MOLECULE TYPE : protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
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Leu Glu Pro Tyr Ala Cys Pro Val Glu Ser Cys Asp Arg Arg Phe Ser 

15 10 15 

Arg Ser Asp Glu Leu Thr Arg His lie Arg lie His Thr Gly Gin Lys 

20 25 30 

Pro Phe Gin Cys Arg He Cys Met Arg Asn Phe Ser Arg Ser Asp His 

35 40 45 



Leu Thr Thr His He Arg Thr His Thr Gly Glu Lys Pro Phe Ala Cys 

50 55 60 



Asp He Cys Gly Arg Lys Phe Ala Arg Ser Asp Glu Arg Lys Arg His 

65 ' 70 75 80 

Thr Lys lie His Leu Arg Gin Lys Asp Thr Ser 

85 ' 90 



(2) INFORMATION FOR SEQ ID NO : 6 : 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 
(C> STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE- DNA (genomic) 



(vii) IMMEDIATE SOURCE: 
(B) CLONE: F7X3 

dx) FEATURE: 

(A) NAME / KEY ; CDS 
{ B ) LOCATION. 1 . .22 



£xi) SEQUENCE DESCRIPTION : SEQ ID NO : 6 : 
GCAATTAACC CTCACTAAAG GG 22 



(2) INFORMATION FOR SEQ ID NO : 7 : 

fi) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
{ B ) TYPE: nucleic acid 
(C) STRANDEDNESS: single 
(0; TOPOLOGY: linear 

■'ii) MOLECULE TYPE : DNA i genomic) 



vi i) IMMEDIATE SOURCE: 
(B) CLONE: BZF3 

(IX) FEATURE: 

(A) NAME / KEY : CDS 

(B) LOCATION: 1 . .21 
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(xi) SEQUENCE DESCRIPTION : SEQ ID NO : 7 : 
GGCAAACTTC CTCCCACAAA T 21 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 60 base pairs 
(3) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA (genomic) 



(vii) IMMEDIATE SOURCE : 
(B) CLONE ; 2F3 6K 

(ix) FEATURE: 

(A) NAME / KEY : CDS 

(B) LOCATION: 1 . .60 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 
ATTTGTGGGA GGAAGTTTGC CNNKAGTNNK NNKNNKNNKN NKCATACCAA AATCCATTTA 6 0 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE : DNA (genomic) 

(vii) IMMEDIATE SOURCE : 
(B) CLONE: R3B 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1. .21 



(Xi) SEQUENCE DESCRIPTION : SEQ ID NO: 9: 
TTGATATTCA CAAACGAATG G 21 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



1 
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ivii; IMMEDIATE SOURCE : 

(Bi CLONE: ZFNsi-B 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 1 . .21 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
CATGCATATT CGACACTGGA A 21 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 66 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY, linear 

MOLECULE TYPE . DNA (genomic) 

(via) IMMEDIATE SOURCE : 
(B) CLONE -. ZF2r6F 

fix) FEATURE: 

(A) NAME / KEY • CDS 

(B) LOCATION 1 . . 66 

ixi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CAGTGTCGAA TATGCATGCG TAACTTCNNK NNKNNKNNKN NKNNKACCAC CCACATCCGC 60 
ACCCAC 66 

(2) INFORMATION FOR SEQ ID NO: 12: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 66 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE. DNA (genomic) 

'vi i) IMMEDIATE SOURCE: 
■B.' CLONE: AFI6rb 

.ix. * FEATURE: 

(A) NAME / KEY : CDS 

(B) LOCATION: 1. .66 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
CTGGCCTGTG TGGATGCGGA TATGMNNMNN MNNMNNMNNC GAMNNAGAAA AGCGGCGATC 60 
GCAGGA ^ 6 
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2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS : 
(A) LENGTH : 24 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ai) MOLECULE TYPE: DNA (genomic) 



vi i) IMMEDIATE SOURCE: 
:B) CLONE: ZFIF 

(IX) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION : 1 . .24 



<x1) sequence description: seq id no:13: 
:atatccgca tccacacagg ccag 24 



2) INFORMATION FOR SEQ ID NO: 14: 

' l ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 8 amino acids 

(B) TYPE; amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

MOLECULE TYPE: peptide 



ix) FEATURE: 

;A} NAME /KEY : Peptide 
(B) LOCATION: 1. .8 



xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14 

Arg Ser Asp Glu Leu Thr Arg His 

1 5 



INFORMATION FOR SEQ ID NO: 15: 

n: SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(n) MOLECULE TYPE: peptide 



ix) FEATURE : 

(A) NAME /KEY . Peptide 
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■;Bi LOCATION: 1 . . 6 

(xi; SEQUENCE DESCRIPTION : SEQ ID NO:15: 

Ser Arg Ser Asp His Leu 
1 5 



(2) INFORMATION FOR SEQ ID NO: 16: 



ill SEQUENCE CHARACTERISTICS: 
iA) LENGTH: 34 base pairs 
iB> TYPE: nucleic acid 
! C) STRANDEDNESS: single 
iD) TOPOLOGY: linear 



(nl MPiT crrTT IT mrrsr . r\*.rr> ; ~ - _ \ 



i IX J FEATURE: 

(A) NAME /KEY : CDS 
f B) LOCATION : 1 . . 34 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
C3TAAATGGG CGCCCTTTTG GGCGCCCATT TACG 34 



INFORMATION FOR SEQ ID NO: 17: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 amino acids 

(B) TYPE: ammo acid 

fC) STRANDEDNESS: single 
■ID) TOPOLOGY, linear 

11) MOLECULE TYPE, peptide 

IX) FEATURE: 

(A) NAME/ KEY Peptide 

(B) LOCATION. 1 . . 8 



xi) SEQUENCE DESCRIPTION : SEQ ID NO: 17 

Arg Ser Asp Glu Arg Lvs Arg His 
1 5 



2, INFORMATION FOR SEQ ID NO: 18: 



1) SEQUENCE CHARACTERISTICS: 
i A) LENGTH: 8 ammo acids 
(3) TYPE: amino acid 
(C) STRANDEDNESS: single 
fD) TOPOLOGY, linear 



(11) MOLECULE TYPE peptide 
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iix) FEATURE: 

(A) NAME / KEY : Peptide 

(B) LOCATION: 1--8 



ixi) SEQUENCE DESCRIPTION: SEQ ID NO: 18 

Trp Ser lie Pro Val Leu Leu His 

1 5 



INFORMATION FOR SEQ ID NO: 19: 

ii) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 amino acids 

(B) TYPE: ammo acid 

(O STRANDEDNESS : single 
(Di TOPOLOGY: linear 

in) MOLECULE TYPE: peptide 



iix) FEATURE: 

(A) NAME / KEY : Peptide 

( B ) LOCATION: 1 . . 8 



ixi; SEQUENCE DESCRIPTION : SEQ ID NO: 19 

Trp Ser Leu Leu Pro Val Leu His 
1 5 

INFORMATION FOR SEQ ID NO: 20: 

' l ) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 8 amino acids 
IB) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : peptide 



(ix) FEATURE: 

(A) NAME / KEY : Peptide 

(B) LOCATION: 1 /. 8 



ixi- SEQUENCE DESCRIPTION: SEQ ID NO; 20 
Phe Ser Phe Leu Leu Pro Leu His 

: 5 



2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 8 amino acids 
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(3) TYPE: ammo acid 

iZ; STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(id MOLECULE TYPE: peptide 



( IX) FEATURE : 

(A) NAME /KEY: Peptide 
CB) LOCATION : 1 . . 8 



xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21 

Leu Ser Thr Trp Arg Glv Trp His 
1 5 



2) INFORMATION FOR SEQ ID NO .22: 

(l) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 8 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 
(D> TOPOLOGY: linear 

(ii) MOLECULE TYPE, peptide 



(IX) FEATURE: 

I A) NAME /KEY: Peptide 
(B) LOCATION. 1 . . 8 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO : 22 

Thr Ser He Gin Leu Pro Tyr His 

1 5 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 61 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY • linear 

i: 1 MOLECULE TYPE DNA 'genomic > 



'ix) FEATURE: 

(A) NAME/ KEY CDS 

(B) LOCATION 1. .61 

(xi) SEQUENCE DESCRIPTION SEQ ID NO:23: 
'GATCTCAGA AGCCAAGCAG GGTCGGGCCT GGTTAGTACT TGGATGGGAG ACCGCCTGGG 6 0 
1 c i 
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(2) INFORMATION FOR SEQ ID NO: 24: 

'iJ SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 6 ammo acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

di) MOLECULE TYPE : peptide 



(ix) FEATURE: 

(A) NAME / KEY Peptide 

!B) LOCATION- l.*26 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 24: 

Tyr lie Cys Ser Phe Ala Asp Cys Gly Ala Ala Tyr Asn Lys Asn Trp 
1 5 10 15 

Lys Leu Gin Ala His Leu Cys Lys His Thr 

20 25 



(2;> INFORMATION FOR SEQ ID NO: 25: 

; i :■ SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 26 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE peptide 



ex; FEATURE: 

(A) NAME /KEY PeDtide 

(B) LOCATION. l.*26 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 25: 

Phe Pro Cys Lys Glu Glu Gly Cys Glu Lys Gly Phe Thr Ser Leu His 

15 10 15 

His Leu Thr Arg His Ser Leu Thr His Thr 

20 25 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 amino acids 

(B) TYPE: ammo acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY linear 

(ii) MOLECULE TYPE peptide 
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(ix) FEATURE: 

;A) NAME /KEY Peptide 
(B) LOCATION 1. .26 



(XI) SEQUENCE DESCRIPTION: SEQ ID NO:26: 

Phe Thr Cys Asp Ser Asp Gly Cys Asp Leu Arg Phe Thr Thr Lys Ala 
1*5 10 15 

Asn Met Lys Lys His Phe Asn Arg Phe His 

20 25 



(2; INFORMATION FOR SEQ ID NO: 27: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 
(D! TOPOLOGY, linear 

(n) MOLECULE TYPE DNA (genomic) 



t ix ' FEATURE: 

(A! NAME /KEY CDS 
(B) LOCATION 1. .13 



ixi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
TGGATGGGAG ACC 13 

(2) INFORMATION FOR SEQ ID NO: 28: 

i i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 8 ammo acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : peptide 

(IX) FEATURE: 

'A) NAME / KEY : Peptide 
IB/ LOCATION : 1 . . 8 

x: SEQUENCE DESCRIPTION: SEQ ID NO : 2 8 : 

Arg Ser Asp Glu Arg Lys Arg His 

1 5 

(2) INFORMATION FOR SEQ ID NO: 29: 

; i ) SEQUENCE CHARACTERISTICS : 
A 1 LENGTH • 21 base cairs 
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(B) TYPE : nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE . DNA (genomic) 



(IX) FEATURE: 

(A) NAME/KEY CDS 
(Bj LOCATION 1 . .21 



I xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 9 : 
GTCCATAAGA TTAGCGGATC C 21 



2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

C C ■ STRANDEDNESS: single 
(D) TOPOLOGY; linear 

MOLECULE TYPE: DNA (aenomic 



ix) FEATURE: 

(A) NAME/KEY: CDS 

( B ) LOCATION : 1 . . 21 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 30: 
GTGAGCGAGG AAGCGGAAGA G 21 



INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

di) MOLECULE TYPE : DNA (genomic) 



ix } FEATURE: 

(A) NAME / KEY : CDS 

(B) LOCATION: 1..34 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:31: 
CCTGCGTGGG CGCCCTTTTG GGCGCCCACG CAGG 34 



(2) INFORMATION FOR SEQ ID NO: 32: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : linear 



di) MOLECULE TYPE : peptide 

(ix) FEATURE: 

(A) NAME/ KEY: Peptide 

(B) LOCATION: 1 . .4 

{D.i OTHER INFORMATION: Xaa at residue 4 = Lys or Pro. 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

Thr Gly Glu Xaa 
1 * 4 



(2) INFORMATION FOR SEQ ID NO: 33: 

t'i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 62 base pairs 
(3) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY : linear 



(n) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE : 

(A) NAME /KEY : 

(B) LOCATION: 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 33: 

ATG CTC GAG CTC CCC TAT GCT TGC CCT GTC GAG TCC TGC GAT CGC CGC 4 8 

Met Leu Glu Leu Pro Tyr Ala Cys Pro Val Glu Ser Cvs Asp Arg Arg 

5 10 ' 15 

TTT TCT CGC TCG GAT GAG CTT ACC CGC CAT ATC CGC ATC CAC ACA GGC 9 6 

Phe Ser Arg Ser Asp Glu Leu Tyr Arg His lie Arg He His Thr Glv 

20 25 30 

CAG AAG CCC TTC CAG TGT CGA ATA TGC ATG CGT AAC TTC AGT CGT AGT 144 

Gin Lys Pro Phe Gin Cys Arg He Cys Met Arg Asn Phe Ser Arg Ser 
35 40 45 

GAC CAC CTT ACC ACC CAC ATC CGC ACC CAC ACA GGC GAG AAG CCT TTT 192 

Asp His Leu Thr Thr His He Arg Thr His Thr Glv Glu Lvs Pro Phe 
50 55 60 

GCC TGT GAC ATT TGT GGG AGG AAG TTT GCC AGG AGT GAT GAA CGC AAG 24 0 

Ala Cys Asp He Cys Gly Arg Lys Phe Ala Arg Ser Asp Glu Arg Lys 
65 70 75 80 

AGG CAT ACC AAA ATC CAT ACC GGR CAG AAG CCC ACT AGT GGC GGT GGT 2 88 

Arg His Thr Lys He His Thr Gly Gin Lys Pro Thr Ser Gly Gly Gly 

85 90 
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CGG ATC GCC CGG CTG GAG GAA AAA GTG AAA ACC TTG AAA GCG CAA AAC 336 
Arg He Ala Arg Leu Glu Glu Lys Val Lys Thr Leu Lys Ala Gin Asn 

100 105 110 

TCC GAG CTG GCG TCC ACC CGG AAC ATG CTC AGG GAA CAG GTG GCA CAG 3 84 
Ser Glu Leu Ala S^r Thr Ala Asn Met Leu Ar n Glu Gin Val Ala Gin 
115 120 3 125 

CTT AAA CAG AAA GTC ATG AAC CAC GCT AGC GGC CAG GCC GGC CAG TAC 432 
Leu Lys GLn Lys Val Met Asn His Ala Ser Gly Gin Ala Gly Gin Tyr 
130 135 140 

CCG TAC GAC GTT CCG GAC TAC GCT TCT TAA 4 62 

Pro Tyr Asp Val Pro Asp Tyr Ala Ser 
145 150 153 



(2) INFORMATION FOR SEQ ID NO: 34: 

(i l SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 153 amino acids 

(B) TYPE: amino acid 

<C! STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi! SEQUENCE DESCRIPTION: SEQ ID NO:34: 

Met Leu Glu Leu Pro Tyr Ala Cys Pro Val Glu Ser Cys Asp Arg Arg 

5 10 15 

Phe Ser Arg Ser Asp Glu Leu Tyr Arg His lie Arg He His Thr Gly 

20 25 30 

Gin Lys Pro Phe Gin Cys Arg He Cys Met Arg Asn Phe Ser Arg Ser 
35 40 45 

Asp His Leu Thr Thr His He Arg Thr His Thr Gly Glu Lys Pro Phe 
50 55 60 

Ala Cys Asp He Cys Gly Arg Lys Phe Ala Arg Ser Asp Glu Arg Lys 
65 70 75 80 

Arg Hxs Thr Lys He His Thr Gly Gin Lys Pro Thr Ser Gly Gly Gly 

85 90 95 

Arg He Ala Arg Leu Glu Glu Lys Val Lys Thr Leu Lys Ala Gin Asn 

100 105 HO 

Ser Glu Leu Ala Ser Thr Ala Asn Met Leu Arg Glu Gin Val Ala Gin 

115 120 125 

Leu Lys GLn Lys Val Met Asn His Ala Ser Gly Gin Ala Gly Gin Tyr 
130 135 140 

Pro Tyr Asp Val Pro Asp Tyr Ala Ser 
145 150 153 
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'2) INFORMATION FOR SEQ ID NO : 3 5 : 

;i; SEQUENCE CHARACTERISTICS: 

iA) LENGTH: 462 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

MOLECULE TYPE: DNA (genomic) 

fix) FEATURE: 

( A ) NAME / KEY 
B ) LOCATION 

(Hi MOLECULE TYPE peptide 

( ix; r ZAXUKh; : 

(A) NAME /KEY 
£3) LOCATION: 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 35: 

ATG CTC GAG CTC CCC TAT GCT TGC CCT GTC GAG TCC TGC GAT CGC CGC 
Met Leu Glu Leu Pro Tyr Ala Cys Pro Val Glu Ser Cvs Asp Arg Arg 

5 10 ' 15 



48 



*.T TCT CGC TCG GAT GAG CTT ACC CGC CAT ATC CGC ATC CAC ACA GGC 96 
Phe Ser Arg Ser Asp Glu Leu Thr Arg His lie Arg He His Thr Gly 

20 25 30 

CAG AAG CCC TTC CAG TGT CGA ATA TGC ATG CGT AAC TTC AGT CGT AGT 14 4 

Gin Lys Pro Phe Gin Cys Arg He Cys Met Arg Asn Phe Ser Arg Ser 
35 40 45 

GAC CAC CTT ACC ACC CAC ATC CGC ACC CAC ACA GGC GAG AAG CCT TTT 192 

Asp His Leu Thr Thr His He Arg Thr His Thr Gly Glu Lys Pro Phe 
50 55 60 

GCC TGT GAC ATT TGT GGG AGG AAG TTT GCC AGG AGT GAT GAA CGC AAG 24 0 

Ala Cys Asp He Cys Gly Arg Lys Phe Ala Arg Ser Asp Glu Arg Lys 

65 70 75 80 

AGG CAT ACC AAA ATC CAT ACC GGT CAG AAG CCC ACT AGT GGC GGT GGT 288 

Arg His Thr Lys Thr His Thr Gly Gin Lys Pro Thr Ser Gly Glv Gly 

85 90 95 

CTG ACC GAC ACC CTG CAG GCG GAA ACC GAC CAG CTG GAA GAC GAA AAA 336 

i-eu Thr Asp Thr Leu Gin Ala Glu Thr Asp Gin Leu Glu Asp Glu Lvs 

100 105 lie 

TCC GCG CTG CAA ACC GAA ATC GCG AAC CTG CTG AAA GAA AAA GAA AAG 3 84 

Ser A.a Leu Gin Thr Glu He Ala Asn Leu Leu Lys Glu Lys Glu Lys 
115 120 125 

CTG GAG TTC ATC CTG GCG GCA CAC GCT AGC GGC CAG GCC GGC CAG TAC 432 

Leu Glu Phe He Leu Ala Ala His Ala Ser Gly Gin Ala Gly Gin Tyr 
:3 ° 135 140 
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CCG TAC GAC GTT CCG GAC TAC GCT TCT TAA 4 62 

Pro Tvr Asp Val Pro Asp Tyr Ala Ser 
145 iso 



<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 153 amino acids 

(B) TYPE: amino acid 

tC) STRANDEDNESS: single 
(D) TOPOLOGY linear 

(ii) MOLECULE TYPE . peptide 

tlx) FEATURE: 

(A) NAME/ KEY 

(B) LOCATION: 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 36: 

Met Leu Glu Leu Pro Tyr Ala Cys Pro Val Glu Ser Cys Asp Arg Arg 

5 10 15 

Phe Ser Arg Ser Asp Glu Leu Thr Arg His lie Arg He His Thx Gly 

20 25 30 

Gin Lys Pro Phe Gin Cys Arg He Cys Met Arg Asn Phe Ser Arg Ser 
35 * 40 45 

Asp His Leu Thr Thr His He Arg Thr His Thr Gly Glu Lys Pro Phe 
50 55 60 

Ala Cys Asp He Cys Gly Arg Lys Phe Ala Arg Ser Asp Glu Arg Lys 
65 70 75 80 

Arg His Thr Lys Thr His Thr Gly Gin Lys Pro Thr Ser Gly Gly Gly 

85 *90 95 

Leu Thr Asp Thr Leu Gin Ala Glu Thr Asp Gin Leu Glu Asp Glu Lys 

100 105 110 

Ser Ala Leu Gin Thx Glu He Ala Asn Leu Leu Lys Glu Lys Glu Lys 
115 120 125 

Leu Glu Phe He Leu Ala Ala His Ala Ser Gly Gin Ala Gly Gin Tyr 
130 135 140 

Pro Tyr Asp Val Pro Asp Tyr Ala Ser 
145 150 



(2) INFORMATION FOR SEQ ID NO : 3 7 : 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



ii) MOLECULE TYPE: DNA (genomic) 



BNSOOCID <WC 96SA31 1A1 J_> 



WO 98 543 11 



PCT/US98/ 10801 



- 115- 

(ix/ FEATURE: 

(A) NAME / KEY : CDS 
(B j LOCATION : 1 . • 19 
(D< OTHER INFORMATION: N at position 10 = 1-50 nucleic 

acids . 

(xi l SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
CGCCCACGCN GCGTGGGCG 19 



2: INFORMATION FOR SEQ ID NO : 3 8 : 

'is SEQUENCE CHARACTERISTICS: 
;Aj LENGTH: 2 8 base pairs 
;B) TYPE: nucleic acid 
[H STRAMDEDMESS : single 
(D) TOPOLOGY: linear 



til) MOLECULE TYPE: DNA (genomic) 

fix) FEATURE: 

(A) NAME / KEY : CDS 
(3) LOCATION: 1 . . 28 
ID) OTHER INFORMATION: N at position 10 = 1-50 nucleic 

acids . 

i xi ) SEQUENCE DESCRIPTION: SEQ ID NO : 3 8 : 
CGCCCACGCN GCGGCGGCGG CGGCGGCG 2 8 

'2) INFORMATION FOR SEQ ID NO: 39: 

d' SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 6 amino acids 

(B) TYPE: amino acid 

fC. STRANDEDNESS : single 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE : peptide 

dx) FEATURE: 

(A) NAME/KEY: Peptide 

(B) LOCATION: 1 . . 76 
(D; OTHER INFORMATION : 

Xaa at residue n B - 

Tyr -Ala - Cys - Pro- Val -Glu- Ser* Cys -Asp- Arg- Arg- Phe-Ser - Lys -Ser-Ala - 

Asp - Leu - Lys - His - 1 le - Arg -His- Thr -Gly - Glu - Lys - Pre -Met - Lys - Leu - Leu - G 1 u - P 

ro repeated 

f ron 2 - 10 times . 

(xi) SEQUENCE DESCRIPTION ; SEQ ID NO : 3 9 : 

Met Lys Leu Leu Glu Pro Tyr Ala Cys Pro Val Glu Ser Cys Asp Arg 

5 10 15 



■ ■ * 
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Arg Phe Ser Lys Ser Ala Asp Leu Lys Arg His lie Arg His Thr Gly 

20 25 30 

Glu Lys Pro Met Lys Leu Leu Glu Pro Tyr Ala Cys Pro Val Glu Ser 

35 40 45 

Cys Asp Arg Arg Phe Ser Lys Ser Ala Asp Leu Lys His He Arg His 

50 ^ e; cn 

Thr Gly Glu Lys Pro Met Lys Leu Leu Glu Pro Xaa 

65 70 75 76 



(2) INFORMATION FOR SEQ ID NO: 40: 

( l ; SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
(0) TOPOLOGY; linear 



(n J MOLECULE TYPE: DNA (genomic) 

Iixi FEATURE: 

(A) NAME / KEY ; CDS 

(B) LOCATION: 1 . .34 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
CCTCGCCGCC GCGGGTTTTC CCGCGCCCCC GAGG 



(2) INFORMATION FOR SEQ ID NO: 41: 

■;i' SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 94 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
i'D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

(A) NAME/ KEY : 

(B) LOCATION: 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 41: 



ATG AAA CTG CTC GAG CCC TAT GCT TGC CCT GTC GAG TCC TGC GAT CGC 4 8 

Met Lys Leu Leu Glu Pro Tyr Ala Cys Pro Val Glu Ser Cys Asp Arg 

5 10 15 

CGC TTT TCT AAG TCG GCT GAT CTG AAG CGC CAT ATC CGC ATC CAC ACT 96 

Arg Phe Ser Lys Ser Ala Asp Leu Lys Arg His lie Arg He His Thr 

20 25 30 

GGC GAA AAA CCG TAC GCG TGC CCT GTC GAG TCC TGC GAT CGC CGC TTT 14 4 

Gly Glu Lys Pro Tyr Ala Cys Pro Val Glu Ser Cys Asp Arg Arg Phe 

35 40 45 
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TCT AAG TCG GOT GAT CTG AAG CGC CAT ATC CGC ATC CAC ACC GGG GAG 192 

Ser Lys Ser Ala Asp Leu Lvs Arg His He Phe He His Thr Glv Glu 

50 55 60 

AAG CCC TAT GCT TGC CCT GTC GAG TCC TGC GAT CGC CGC TTT TCT AAG 24 0 

Lys Pro Tyr Ala Cys Pro Val Glu Ser Cys Asp Arg Arg Phe Ser Lys 
65 70 75 80 



TCG GCT GAT CTG AAG CGC CAT ATC CGC ATC CAC ACC GGT CAG AAG CCC 2 88 
Ser Ala Asp Leu Lys Arg His He Arg He Asn Thr Gly Gin Lvs Pro 

85 90 95 

AwT 2 94 

Thr Ser 

98 



(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 98 amino acids 

(B) TYPE: ammo acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 

Met Lys Leu Leu Glu Pro Tyr Ala Cys Pro Val Glu Ser Cvs Asp Arg 

5 10 15 

Arg Phe Ser Lys Ser Ala Asp Leu Lys Arg His lie Arg He His Thr 

20 25 30 



~y Glu Lys Pro Tyr Ala Cys Pro Val Glu Ser Cys Asp Arg Arg Phe 
35 40 45 

Ser Lys Ser Ala Asp Leu Lys Arg His He Phe He His Thr Glv Glu 
50 55 60 

Lys Pro Tyr Ala Cys Pro Val Glu Ser Cys Asp Arg Arg Phe Ser Lvs 
65 70 75 80 

Ser Ala Asp Leu Lys Arg His He Arg He Asn Thr Gly Gin Lys Pro 

85 90 95 

Thr Ser 

98 



INFORMATION FOR SEQ ID NO:43: 

'ii SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 543 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii: MOLECULE TYPE: DNA (genomic) 
(IX) FEATURE: 



WO 98/54311 



PCT7US98/10801 



- 118 - 

B) LOCATION: 



iii) MOLECULE TYPE : peptide 

(A) NAME /KEY : 

(B) LOCATION: 



;xi) SEQUENCE DESCRIPTION : SEQ ID NO: 43: 

ATG CTC GAG CTC CCC TAT GOT TGC CCT GTC GAG TCC TGC GAT CGC CGC 4 8 

Met Leu Glu Leu Pro Tyr Ala Cys Pro Val Glu Ser Cys Asp Arg Arg 

5 10 15 



TCT CGC TCG GAT GAG CTT ACC CGC CAT ATC CGC ATC CAC ACA GGC 9 6 

Phe Ser Arg Ser Asp Glu Leu Thx Arg His lie Arg lie His Thr Gly 

20 25 30 

CAG AAG CCC TTC CAG TGT CGA ATA TGC ATG CGT AAC TTC AGT CGT AGT 14 4 

Gin Lys Pro Phe Gin Cys Arg lie Cys Met Arg Asn Phe Ser Arg Ser 
35 40 45 

GAC CAC CTT ACC ACC CAC ATC CGC ACC CAC ACA GGC GAG AAG CCT TTT 192 

Asp His Leu Thr Thx His He Arg Thr His Thr Gly Glu Lys Pro Phe 
50 55 60 

GCC TGT GAC ATT TGT GGG AGG AAG TTT GCC AGG AGT GAT GAA CGC AAG 24 0 

Ala Cys Asp He Cys Gly Arg Lys Phe Ala Phe Ser Asp Glu Arg Lvs 
6 5 7 0 7-5 8 0 

AGG CAT ACC AAA ATC CAT ACC GGG GAG AAG CCC TAT GCT TGC CCT GTC 2 88 

Arg His Thr Lys He His Thx Gly Glu Lys Pro Tyr Ala Cys Pro Val 

82 90 95 

GAG TCC TGC GAT CGC CGC TTT TCT CGC TCG GAT GAG CTT ACC CGC CAT 3 36 

Glu Ser Cys Asp Arg Arg Phe Ser Axg Ser Asp Glu Leu Thr Arg His 

100 105 110 

ATC CGC ATC CAC ACA GGC CAG AAG CCC TTC CAG TGT CGA ATA TCC ATG 3 84 

He Arg He His Thr Gly Gin Lys Pro Phe Gin Cys Arg He Cys Met 
115 120 * 125 

CGT AAC TTC AGT CGT AGT GAC CAC CTT ACC ACC CAC ATC CGC ACC CAC 43 2 

Arg Asn Phe Ser Arg Ser Asp His Leu Thr Thr His He Arg Thx His 
130 135 140 

ACA GGC GAG AAG CCT TTT GCC TGT GAC ATT TGT GGG AGG AAG TTT GCC 4 80 

Thr Gly Glu Lys Pro Phe Ala Cys Asp He Cys Gly Arg Lys Phe Ala 

145 150 155 160 

AGG AGT GAT GAA CGC AAG AGG CAT ACC AAA ATC CAT TTA AGA CAG AAG 52 8 

Arg Ser Asp Glu Arg Lys Arg His Thr Lys He His Leu Arg Gin Lys 

165 170 175 

GAC TCT AGA ACT AGT 543 
Asp Ser Arg Thr Ser 

180 

.2) INFORMATION FOR SEQ ID NO: 44: 
(1) SEQUENCE CHARACTERISTICS : 
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(Aj LENGTH: 1B1 ammo acids 
(B : TYPE: ammo acid 
(CI STRANDEDNESS : single 
( D 1 TOPOLOGY: linear 

ii) MOLECULE TYPE • peptide 

IX) FEATURE: 

(A.) NAME /KEY : 
(Bi LOCATION: 



xi' SEQUENCE DESCRIPTION: SEQ ID NO: 44: 

Met Leu Glu Leu Pro Tyr Ala Cys Pro Val Glu Ser Cvs Asp Ara Arg 

5 10 15 

f^Un <^ „ — n — — f *. * f> 1 , , * , . t^u _ rt ; ~ T 1 _ > t 1 _ f I , _ fni 

*■ J'-i ^ J' — J. nj^/ u oiuu a a a A. -TV1. y 114. «3 A. J. C rt-i. y .a. A. iZ il i. O i. ill. yxy 

20 25 30 



Gin Lys Pro Phe Gin Cys Arg lie Cys Met Arg Asn Phe Ser Arg Ser 
35 40 45 

Asp His Leu Thr Thr His He Arg Thr His Thr Gly Glu Lvs Pro Phe 
50 55 60 

Ala Cys Asp He Cys Gly Arg Lys Phe Ala Phe Ser Asp Glu Arg Lys 

65 70 * 75 80 

Arg His Thr Lys He His Thr Gly Glu Lys Pro Tyr Ala Cvs Pro Val 

S2 90 ' 95 

Glu Ser Cys Asp Arg Arg Phe Ser Arg Ser Asp Glu Leu Thr Arg His 

100 105 110 

lie Arg He His Thr Gly Gin Lys Pro Phe Gin Cys Arg He Cvs Met 
115 120 125 

Arg Asn Phe Ser Arg Ser Asp His Leu Thr Thr His He Arg Thr His 
130 135 140 

Thr Gly Glu Lys Pro Phe Ala Cys Asp He Cys Gly Arg Lys Phe Ala 
145 150 155 160 

Arg Ser Asp Glu Arg Lys Arg His Thr Lys He His Leu Arg Gin Lys 

165 170 175 

Asp Ser Arg Thr Ser 

180 



INFORMATION F2R SEC ID NO : 4 5 : 

. i : SEQUENCE CHARACTERISTICS : 
(A, LENGTH: 4 6 base pairs 
(B TYPE: nucleic acid 
(C: S7RANDEDNESS : single 
(D) TOPOLOGY: linear 



id MOLECULE TYPE: DNA (genomic) 
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(ix) FEATURE: 

(A) NAME/KEY: CDS 
(31 LOCATION : 1 . - 34 



;xi ' .^FnTTPMrP npc-TTDTTnv. <zrr\ T-n vrn . a c . 

- M — — - — ' — to fe* W w * * ^ ^ * * w • * - hp* 4* ^ ^ * m w * ~K • 

GAGGAGGAGG AGGGATCCAT GCTCGAGCTC CCCTATGCTT GCCCTG 4 6 



(2) INFORMATION FOR SEQ ID NO: 46: 

(l) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 39 base pairs 
(3) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 



til) MOLECULE TYPE: DNA (genomic) 

(IX) FEATURE: 

(A) NAME / KEY : CDS 

(B) LOCATION : 1 . . 34 



ixi) SEQUENCE DESCRIPTION : SEQ ID NO: 46: 
GAGGAGGAGA CCGGTATGGA TTTTGGTATG CCTCTTGCG 3 9 



(2) INFORMATION FOR SEQ ID NO: 47: 

;i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 57 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY, linear 



(ii) MOLECULE TYPE DNA (genomic) 

(IX) FEATURE: 

(A) NAME /KEY CDS 

(B) LOCATION: 1 . . 57 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 
GAGGAGGAGA CCGGTGAGAA GCCCTATGCT TGCCCTGTCG AGTCCTGCGA TCGCCGC 57 



(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 3 2 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(li) MOLECULE TYPE: DNA (genomic) 
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■x} FEATURE: 

(A) NAME/KEY: CDS 
(3) LOCATION: 1 . - 32 



(xu SEQUENCE DESCRIPTION: SEQ ID NO: 48 
GAGGAGGAGA CTAGTTCTAG AGTCCTTCTG TC 32 



2; INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 
iZ) TOPOLOGY: linear 



li) MOLECULE TYPE: DNA (genomic) 

ix) FEATURE: 

(A) NAME / KEY : CDS 

(B) LOCATION: I. .38 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
GATGTATGTA GCGTGGGCGG CGTGGGCGTA AGTAATGC 3 8 

iZ) INFORMATION FOR SEQ ID NO: 50: 

i l ) SEQUENCE CHARACTERISTICS : 
(A; LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA (genomic) 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 1 . .38 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 

r^*T* > T T rr> r~> /-* /~+ r~* *^ "» ^ -r* * ^ >^ ^ ~* Ct 



J- I 



INFORMATION FOR SEQ ID NO: 51: 

li! SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 
iC) STRANDEDNESS: single 

(Di TOPOLOGY: linear 



WO 98/54311 



PCT/US98/10801 



- 122 

til) MOLECULE TYPE: DNA (genomic; 

;:x! FEATURE: 

(A) NAME /KEY : CDS 
< B } LOCATION: 1 . . 38 



;xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
GATGTATGTA GCGGCGGCGG CGGCGGCGTA AGTAATGC 3 8 



(2) INFORMATION FOR SEQ ID NO: 52: 



ii) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



ii) MOLECULE TYPE : DNA (genomic) 

IX) FEATURE: 

(A) NAME / KEY : CDS 

(B) LOCATION: 1. .29 



xi ) SEQUENCE DESCRIPTION : SEQ ID NO: 52 



GATGTATGTA GCGTGGGCGT AAGTAATGC 



29 



:2i INFORMATION FOR SEQ ID NO: 53: 



(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 29 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE : DNA (genomic) 

dx) FEATURE: 

(A) NAME/ KEY : CDS 

(B) LOCATION: 1 . .29 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 53: 
GATGTATGTA GGGGCGGGGT AAGTAATGC 2 9 
(2) INFORMATION FOR SEQ ID NO : 54 : 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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in) MOLECULE TYPE: DNA .genomic) 

(IX) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1 . .29 



(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 
GATGTATGTA GCGTGGGCGT AAGTAATGC 2 9 

(2) INFORMATION FOR SEQ ID NO: 55: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 41 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(IX) FEATURE: 

(A) NAME / KEY : CDS 

(B) LOCATION: 1 . .41 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 55: 
GAGGAGGAGG AATTCCGACA TTTATAATGA ACGTGAATTG C 41 

12} INFORMATION FOR SEQ ID NO: 56: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 
(Ci STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

(A) NAME / KEY : CDS 
(B: LOCATION: 1 . . 4 5 

.xi; SEQUENCE DESCRIPTION: SEQ ID NO: 56: 
TGCGCCCACG CCZCCCACGC GATGATTGGG AGCTTTTTTT GCACG 4 5 

(2) INFORMATION FOR SEQ ID NO: 57: 

f 1 1 SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 51 base pairs 

(B) TYPE : nucleic acid 
(C! STRANDEDNESS: single 

TOPOLOGY ■ linear 
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(ii) MOLECULE TYPE : DNA (genomic) 

(ix) FEATURE: 

(A) NAME / KEY : CDS 

(B) LOCATION: 1..51 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 57: 
TCGCGTGGGC GGC GTGGGCG CAAAAAATTA TTATCATGGA TTCTAAAACG G 51 

(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 4 2 base pairs 

(B) TYPE: nucleic acid 
CO STRAND ED NESS : single 
(D) TOPOLOGY: linear 



ii) MOLECULE TYPE: DNA (genomic) 

ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 1 . . 42 



ixii SEQUENCE DESCRIPTION: SEQ ID NO: 58: 
GAGGAGGAGG CGGCCGCAGG TAGATGAGAT GTGACGAACG TG 4 2 



:2J INFORMATION FOR SEQ ID NO: 59: 

li' SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 45 base pairs 
(3) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

{ B ) LOCATION : 1 . . 4 5 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 59: 
TGCCCCGCCC CCGCCCACGC GATGATTGGG AGCTTTTTTT GCACG 4 5 

(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 51 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE : DNA (genomic) 

;:x' FEATURE: 

(A) NAME / KEY ; CDS 

(B) LOCATION: 1. .51 

(XD SEQUENCE DESCRIPTION: SEQ ID NO: 60: 
TCGCGTGGGC GGGGGCGGGG CAAAAAATTA TTAT CATGGA TTCTAAAACG G 51 



INFORMATION FOR SEQ ID NO: 61: 



(i) SEQUENCE CHARACTERISTICS : 
iA) LENGTH: 18 base pairs 



(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 



<ii> MOLECULE TYPE: DNA (genomic) 

!:x) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 1 . . 18 



ixij SEQUENCE DESCRIPTION: SEQ ID NO: 61 



GCGTGGGCGG CGTGGGCG 18 



2) INFORMATION FOR SEQ ID NO: 62: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA (genomic) 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 1 . . 18 

Ixi; SEQUENCE DESCRIPTION: SEQ ID NO: 62: 

j j * 'o ^j'-j ^. GGGCGGGG 18 
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CLAIMS 

1 .An isolated zinc finger-nucleotide binding polypeptide variant comprising at least 
two zinc finger modules that bind to a cellular nucleotide sequence and modulate 
the function of the cellular nucleotide sequence. 

2. The variant of claim 1, wherein the modulation is enhancement of transcription 
of a gene operative!}' linked to the cellular nucleotide sequence. 

3 The variant of claim 1 , wherein the modulation is suppression of transcription of 
a gene operatively linked to the cellular nucleotide sequence. 

4. The variant of claim 1, which is derived from a zinc finger-nucleotide binding 
polypeptide selected from the group consisting of zif 268 and TFII1A. 

5 The variant of claim 1 , wherein the cellular nucleotide sequence is DN A. 

5. The variant of claim 1 , wherein the cellular nucleotide sequence is RNA. 

7. The vanant of claim 1 , wherein the polypeptide contains a linker region between 
zinc fingers, the linker having an ammo acid sequence TGEKP. 

S. The variant of claim 1, wherein the cellular nucleotide sequence is a structural 
gene nucleotide sequence. 

The variant of claim 1, wherein the cellular nucleotide sequence is a promoter 
nucleotide sequence. 

1 0. The variant of claim 9, wherein the promoter is an onco-promoter. 



BNSDOCiD <WC 9te*3i iAi i > 



/5431I PCT7US98/10801 

- 127 - 

The variant of claim 10, wherein the promoter is a viral promoter. 

The variant of claim 1 , wherein the cellular nucleotide sequence is a retroviral 
nucleotide sequence. 

The variant of claim 12, wherein the retrovirus is a human T-cell lymphotrophic 
virus (HTLV). 

The variant of ciaim 13, wherein the retrovirus is KTLV-i or HTLV -2. 

The variant of claim 12, wherein the retrovirus is a human immunodeficiency 
virus (HIV). 

The variant of claim 1 5, wherein the retrovirus is HIV-1 or HIY-2, 

The variant of claim 1 . wherein the cellular nucleotide sequence is an oncogene 
nucleotide sequence. 

The variant of claim 1, wherein the cellular nucleotide sequence is a plant cellular 
nucleotide sequence. 

A nucleotide sequence encoding a zinc fmger-nucieotide binding polypeptide 
van ant of claim 1 . 

A recombinant expression vector containing the zinc finger-nucleotide binding 
polypeptide variant of claim 1. 
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A pharmaceutical composition comprising a therapeutically effective amount of 
a zinc finger-nucleotide binding polypeptide derivative or a therapeutically 
effective amount of a nucleotide sequence which encodes a zinc finger-nucleotide 
binding polypeptide derivative, wherein the derivative binds to a cellular 
nucleotide sequence to modulate the function of the cellular nucleotide sequence, 
in combination with a pharmaceutically acceptable carrier. 

The pharmaceutical composition of claim 21, wherein the modulation is 
enhancement of transcription of a gene operatively linked to the cellular 
nucleotide sequence. 

The pharmaceutical composition of claim 21, wherein the modulation is 
suppression of transcription of a gene operatively linked to the cellular nucleotide 
sequence. 

The pharmaceutical composition of claim 21, wherein the zinc finger-nucleotide 
binding polypeptide derivative is a truncated wild-type zinc finger-nucleotide 
binding domain. 

The pharmaceutical composition of claim 21, wherein the zinc ringer binding 
polypeptide derivative is a variant polypeptide. 

A method for inhibiting a cellular nucleotide sequence comprising a zinc finger- 
nucleotide binding motif, the method comprising contacting the motif with an 
effective amount of a zinc finger-nucleotide binding polypeptide derivative which 
binds the motif. 

The method of claim 26, wherein the zinc finger binding polypeptide derivative 
is a truncated zinc finger protein. 
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28. The method of claim 26, wherein the zinc finger polypeptide derivative is a 
variant polypeptide. 

29. The method of claim 26, wherein the cellular nucleotide sequence is DNA. 

30. The method of claim 26, wherein the cellular nucleotide sequence is RNA. 

3 1 . The method of claim 26, wherein the cellular nucleotide sequence is a structural 

n n t I I ♦ « a ^» * ■ ^ n ^ 

32. The method of claim 26, wherein the cellular nucleotide sequence is a promoter 
nucleotide sequence. 

33. The method of claim 26, wherein the cellular nucleotide sequence is an oncogene 
nucleotide sequence. 

34. The method of claim 26. wherein the cellular nucleotide sequence is a plant 
cellular nucleotide sequence. 

3 5 The method of claim 3 1 , wherein the zinc finger binding polypeptide derivative 
is a variant polypeptide. 

36 A method of treating a subject with a cell proliferative disorder, wherein the 
disorder is associated with the modulation of a cellular nucleotide sequence 
associated with a zmc finger-nucleotide binding motif, comprising contacting the 
zinc finger-nucleotide binding motif with an effective amount of a zinc finger- 
nucleotide buiding poKpeptide derivative that binds to the zinc finger-nucleotide 
binding motif to modulate activity of the cellular nucleotide sequence. 
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37. The method of claim 36, wherein an expression vector comprising a 
polynucleotide sequence encoding a zinc finger-nucleotide binding polypeptide 
derivative is introduced into the cells of the subject. 



38. The method of claim 37, wherein the expression vector is a virus. 



39. The method of claim 36, wherein the modulation is enhancement of transcription 
of a gene operatively linked to the cellular nucleotide sequence. 



40. The method of claim 36, wherein the modulation is suppression of transcription 
of a gene operatively linked to the cellular nucleotide sequence. 



41. The isolated zinc finger-nucleotide binding polypeptide variant comprising at 
least four zinc finger modules that bind to a cellular nucleotide sequence and 
modulate the function of the cellular nucleotide sequence. 



42. The isolated zinc finger-nucleotide binding polypeptide variant of claim 1. 
composing at least six zinc finger modules that bind to a cellular nucleotide 
sequence and modulate the function of the cellular nucleotide sequence. 



43. The isolated zinc finger-nucleotide binding polypeptide variant of claim 1, 
wherein the polypeptide binds to a cellular nucleotide sequence having 18 
contiguous base pairs. 



44. The isolated zinc finger-nucleotide binding polypeptide variant of claim 1, 
wherein the polypeptide binds to a cellular nucleotide sequence comprising two 
9-base pair binding sites. 
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The isolated zinc finger-nucieotide binding polypeptide variant of claim 44, 
wherein the two 9-base pair binding sites are separated by a variable number of 
nucleotides. 



46. The isolated zinc finger-nucieotide binding polypeptide variant of claim 44, 
wherein the two 9-base pair binding sites are contiguous. 

47. The nucleotide sequence of claim 19, further comprising a transcriptional a- 



48. The nucleotide sequence of claim 47, wherein the transcriptional activation 
domain is a herpes simplex virus VP 16 protein. 

49. The nucleotide sequence of claim 19, further comprising a repressor domain in 
operable linkage with the nucleotide sequence. 

50. The nucleotide sequence of claim 49, wherein the repressor domain is the 
Kruppel -associated box A domain (KRAB-A). 



WO 98/5431 1 PCT7US98/10801 

2 / 22 

i£ >- C\j C} 

uj cr ci cc 

CO LU UJ Uj 

^ CD CD CD 

O ^ ^ ^ 

o u. ll C; 




3NSCOC Z <»v: «S4.3' , A 




BNSDOCiD <WC i > 



WO 98/54311 



5 / 22 



PCr/US9»/10801 



CO 



I 



I 

o 
I 



\ 



\ 



O 
O 



\ 



\ 



£ 

© 

1 1 



m 

O 

o 



01 
00 

ro 
ro 
•o 

O 
O 



o 

rO 

o 
o 



WD 

in 
oi 

o 



CSJ 



\ 



\ 



\ 



\ 



\ 



\ 



\ 



\ 



\ 



Csl 



PO 



O 

to 



O 

3 
O 

o a 



in 



tO T CVJ 



BNSOOCID <WQ i > 



WO 98/54311 



6 / 22 



PCT/US98/10801 




WO 98/54311 



7 / 22 



PCT/i;S98/10801 



10 20 jo 40 

ctc gag c:: tat cct tcc cct ctc gag tc: tc: cat ccc ccc ttt tct 
gag ctc ccc a fa cga acg cga cag ctc agg acg cta ccc CCC AAA AG a 



50 60 70 80 90 

* t til 

GGC TCC CAT CAG CTT ACC CCC CAT ATC CCC ATC CAC ACA CCC CAG AAG 
CCC ACC CTA CTC CAA TCC CCC CTA TAG CCC TAG CTC TCT CCC CTC TTC 
R S D E L T R E I R I E T C 0 R> 



TOO 110 120 1 JO 140 

• ■ • • i 

CCC TTC CAG TCT CGA A T A TCC ATC CCT AAC TTC ACT CCT ACT CAC CAC 

GGC AAG CTC ACA CCT TAT ACG TAC CCA TTC AAC TCA CCA TCA CTC CTC 

- " - C R T C M R N " S R S C R> 



150 150 170 130 190 



i ^ ^ i * fir- i i 



wA'm Mlt uoL alw Lav M^A uuk. U^u AAj I ' ' ' uuw lO' 

CAA TC-j CiC TAj CCC TGG CTC TCT CCC CTC TTC CGA AAA CCC ACA 
LIT-. !RT[T'C:K?"AC> 



200 210 220 230 240 

CAC ATT TCT GGC ACG AAC TTT CCC AGG ACT CAT CAA CCC AAC AGC CAT 
CTC TAA ACA CCC TCC TTC AAA CCG TCC TCA CTA CTT CCC TTC TCC CTA 
DICCRKFARSDERKRE> 



250 260 270 

i * • 

ACC AAA ATC CAT TTA ACA CAC AAC GAC ACT ACT 

TGC TTT TAC GTA AAT TCT CTC TTC CTC TCA TCA 

T K [ E L R 0 K E T S> 



FIG.7 



9NSDOOD <*0 9ftM3iiAi i > 



WO 98/54311 



PCT/US98/10801 



8 / 



ANT I PARALLEL /?- Sr£iT 



a - HEL!X 



W L : L 



A . C ? V : S C D fi S f 3 P. S 0 t L ' R H ! ?. ! h : 



r t/ » - 



* 0 



- - C W R N F ; R S D H L T 7 K ! R T H 



■ r I NGER 2 



:K?FACD!--CCRKFARSDERKRHTKIHL FINGER 3 



R 0 K C 5 R T s r s ; 0 A C C Y P Y D V P D Y A s 



1 



FIG.8A 



a <; a - 

W * " 



c 6 



C C T 
C G A 



' > 

GCG TGGCCGfCCC 

CGCACCCGCicCC 



T 



( PWE ) -plll 




C T C T T C 

G A C A A C 



I C I 

A C A 



HIV OKA 



FIG.8B 



WO 98/54311 



22 



PCT/US98/10801 




BN5DOC1C «WO 96&43"V ■ > 



WO 98/54311 



10 / 22 



PCT/X!S98/1080I 



Cells 

MW ' : 



Stds fPTG IPTG Cyt IBs Pure 
97 _ ^ m 




1 2 3 4 5 6 

FIG. 10 



*A-4J ' 'A* 




BNSDOOD <WC 966*31 1* 1 _>_> 



WO 98/54311 



i: / 



PCT/US98/10801 




WO 98/5431 1 PCT/US98/10801 

13 / 22 



10 20 30 40 

• ■ i « 

£T~ r T '~ r" Ti7 T^" ^"T ri^ 7r~ ri* r><~~ r*** 

w.w u«u L.w — w mi low Lw ui v Uau ILw -uw CAi Cuv Cow 

TA" r iT "TT " 1 ~ ^ r "* i'» » « 1 r a* r*" i c* i ~<- r'*'" 

( Mw u«w w v Lrnu uvu A i « uu« UvA lAj L . U Auu Auu C I A UUw Uwv 

M L i L P v A C P V f S C D R R> 
50 53 70 80 90 

• • « • * 

tt: t:t ccc tcc cat cac ctt acc ccc cat atc ccc atc cac aca ccc 

AAA ACA CCC ACC CTA C T 2 CAA TGG CCC G T A TAG CCC TAG GTC TCI CCC 

> 3 R S C : L T R H I R [ H T C> 



100 110 120 130 140 

I ■ 1 a , 

L. a ^ r.-^ w — . , , „ i J- Uia A.n luw A i 'j lu I ^AL I ! L 1 I Au i 

w i w - - ww a.'wj uiw ^-^ Uw iAl Aw^j Iav. y»A I 10 AA^ I L.A wwA I LA 

0K?F0CRICMRNFSRS> 

150 150 170 ISO 190 

» iti , 

CAC CAC CTT ACC ACC CAC ATC CCC ACC CAC ACA GCC CAC AAC CCT TTT 
Ciw Giw GAA i'ju tuw yiu TAG GCG iw-G GIG TCi CCC CTC TTC GGA AAA 
0 H L I ' H ! R T H T G £ K P F> 

200 vi ::o 240 

• • • t 1 

CCC TCT CAC ATT TCT GCC AC-C AAG TTT CCC ACC ACT CAT GAA CCC AAC 
CC-C F\ CTG TAA A"i CCC "CO TTC AAA CCG TCC TCA CTA CTT CCC TTC 
ACOICCRKFARSDERJO 

250 250 270 280 

• • t • 

AGG CAT ACC AAA ATC CAT ACC CCT CAC AAC CCC ACT ACT GCC GCT CCT 

TCC CTA TGG TTT TAG CTA TGG CCA GTC TTC GGG TGA TCA CCC CCA CCA 

RHTKIHTCOKPTS G C C> 

290 300 310 320 330 

* t « t t 

CCG ATC CCC CCC CTC GAG GAA AAA CTG AAA ACC TTC AAA GCG CAA AAC 
CCC TAG CGC GCC CAC CTC CTT TTT CAC TTT TCC AAC TTT CCC GTT TTG 
R!ARLEEKVKTlKA0N> 



LINKER 



JUN 



FIG.13A 



WO 98/54311 



14 / 22 



PCT/US98/10801 



*0 350 360 370 330 

cac ccc tcc a:: ccc aac atg ct; ago gaa cac g:g gca gag 

^G C7C GAG GGG AGG TOO GGG TTG TAG G^ TCC C77 G7G GAG GG7 G r G 

S ^.AS7ANMLfi£QVAC > 
-SO 400 410 420 430 



CTT AAA CAG AAA G7G A 70 AAC CAC CC7 AGG GGG CAC GGG GGG GAG TAG 
uaA iii Ci'w iTT CAG i AG i TG G7G CCA TCC GGG GTG GGG CCG G7C ATG 
t K Q K VWNHASC0AG0 V > 

440 45C 460 

• i ■ 

7 * ^ " t ~ — — » ^ * *•) - ~ - y *• * Til 

*.w.J . -i W ur. bit WvJ c . i uv I i ^ i I hA 

CuG ATb Ciw GAA GGG Ciu A,C GGA AGA ATT 
F Y D V P D v A ; .> 

KCAP-PTICi TAG 



rip -: 7[Q 



WO 98/5431 1 PCT/US98/10801 

15 / 22 

10 20 JO 40 

< • • i 

a;c ctc gag ct: gcc tat cct tcc cct ctc gac tc: tgc gat ccc cgc 

tag gag ctc gag cgc ata cca agc cca cac ctc acg acg cta cgc cgc 

w L £ L P Y A C P V £ S C D S R> 

50 60 70 80 gn 



I I ! 



TCT CCC TCC CAT Ctf CTT ACC CCC CAT ATC CCC ATC CAC ACA CCC 
ACA CCC ACC CTA CTC CAA TCC CCG CTA TAG CCC TAG CTC TCT CCC 
' S R 5 0 E L T R H [ R i H T C> 

100 HO 120 130 140 

* « i • • 

CAC AAC CCC TTC CAC TCT CCA ATA TCC ATC CCT AAC TTC ACT CCT ACT 

c: ttc ccc aac ctc a:a cct tat acc tac cca tig a.ag tca cca tca 

0 K ? r C C r i C U R H F S R S> 
150 150 170 180 190 

* • a i a 

CAC CAC CTT ACC ACC CAC ATC CCC ACC CAC ACA CCC GAG AAC CZl 
CTC CTC CAA TCC TCC CTC TAG CCC TCC CTC TCT CCG CTC TTC CCA AAA 
DHL T7HIRTHTCEKPf> 

200 210 220 230 240 

« • tit 

CCC TCT GAC ATT TCT CGC AGG AAC TTT CCC AGC ACT CAT GAA CCC AAC 

CCC ACA CTC TAA ACA CCC TCC TTC AAA CCC TCC TCA CTA CTT CCC TTC 

A C D I C C R K F A = S C E R K> 

250 250 270 280 

» • i • 

ACC CAT ACC AAA ATC CAT ACC CCT CAG AAC CCC ACT ACT CCC CGT GCT 
TCC CTA TCC TTT TAC CTA TCC CCA CTC TTC GCC TCA TCA CCC CCA CCA 
RHTKTHTCGKPTS ,C C p. 

LINKER 

290 300 310 320 330 

• • • t . 

CTG ACC CAC ACC CTC CAC GCC GAA ACC CAC CAG CTG CAA GAC CAA AAA 
CAC TCC CTG TGG GAC CTC CCC CTT TCC CTC CTC CAC CTT CTG CTT TTT 
I T D T L 0 A E T Q Q L E D E K> 

r 



r os 



FIG.14A 



BNSOOCiD <WO 9eS43liAi_l_> 



WO 98/54311 



16 / 22 



PCT/US98/10801 



^ 250 jSO 570 250 

i:c c:c :rc :aa acg caa atc cgc aac ctg ctg aaa cw aaa gaa ax; 

ACG CGG GAC CTT 7CG C7T Ttf CGC TTC CAC GAG TT7 C77 T7T C7T 77C 

S * L Q T [ [ A N L L K £ K i K> 

^30 400 410 42C 4 JO 



a: gag t:c atc ctg cc: cca cac cct agc ccc cac cc: cgc cac t*c 

cac rcc cic ice cac crc ccc en tcc ctc ctc cac cti ctg c:t tit 

- £ f l l a a h a S g 0 a c c y> 

443 450 4:3 



■ » • 

~i* ^ < *» — p i » " » ^ 

w-wi i.-n- u.. : ^ j t^i M.A 

r~" ' ~" ^ » 1 ~" r~" 17." t i — 

wu\» j w/---. uuu Liu Alu i*^-* -uA Ai I 

= Y D V P 0 Y A 5 .> 

D£CA=[?TiD: TAG 



FIG. MB 



WO 98/54311 



17 / 22 



PCT/US98/10801 



10 20 30 40 

atc aaa ctg c;c gac cgc tat got tcg c:t ctc gag tcc tcg gat cgc 

1 ^ 1 1 t un„ unu u . w uuu A i a Auu UjA L*u w t L Auu A^o L i A (jUj 

MI <LL:?YACPV[SCuR> 



-0 60 70 80 90 

* • t t • 

CGG ITT TGI AAG TCG CGI CAT CTG AAC CGC CAT ATG CCC ATC CAC AC T 
GCC AAA ACA TTG AGG CCA CTA CAC TTC CCC CTA TAG GGG TAG CTG TGA 

« r S K S A D L K R H i R ! H T> 



'wG ilO 120 130 140 

• • • • • 

urvn .-A A i Au Cu'j CG i GTC GAj T\»« iyi» CAT CGC CGG Ti i 

^ ■ 1 ' i .*\ ■ j wjw r-„> uuA L * w r«rj n^y urn y^j nr.- 

C:K?YAC?VESCDRRF> 





• - -■ 

" ■ i 

• V - 










170 






ISO 150 




f 




• 












t * 


i „ : 




j* — 


^ i - » » 

u .- , ^,0 


i • ^ 




CAT 


ATC 




A 1 'w ftCW UU'J U^uj 


A j A 








1 i w 




u i A 


i Au 




1 Au u 1 0 TuU Lww CTC 




K 3 

200 

1 


£ 


210 


K 


R 


4.. 


I 

:J 
t 


R 


! H T G £> 
2.*D 240 

t t 


AAC 


L-ww TAi 


Cu 


TGC CCT 


u i u 


/* 


1 




GAT 


CGC CGC TTT TCT AAC 




■ " i * \ 


r"* » 






,» — - 

>V 1 W 


t * ~ 




CTA 


GuG CCC AAA AG A TTC 


K 


F Y 


A 


C P 


V 


1. 

U 


s 


C 


C 


R R F S K> 



250 250 270 280 

• • # t 

TCG CCT CAT CTG AAC CCC CAT ATC CCC ATC CAC ACC CCT CAC AAG CCC 

ACC CGA CTA GAC TTC CCC CTA TAG CCC TAG GTC TCG CCA CTC TTC GGG 

SAQLKRH 1 R I NTGQKP> 



290 

ACT ACT 

TGA TCA 

T S> 



FIG. 15 



BNSOOCiD <WC 



WO 98/5431 1 PCTAJS98/1080I 

18 / 22 



10 20 30 40 

A i o w Tl GAu L ■ w www iAi i *uu i ij>u uAo i *>«-. iww 0^1 ujw Cju 

7*r ^ t /» rT" ^ i I-*'* i ir* 7 t a a** 

I AL \jrxj uiv w^w" Uuu aim UvA n^u Uua LAj w>w aoo tin 'joo uuj 

U I E I P Y A C P V E S C D £ R> 



50 60 70 80 SO 

• t t * • 

ITT TGT COO TCC GAT GAG CTT ACG CGG CAT ATC CGG ATl wAw AGA CGG 

AAA AGA GGG AGG CTA CTC CAA TGG CCG CIA I AG CGG TAG CTC TGT CGG 

f S R 5 D E L T R H I R I H T C> 



1C0 110 120 130 1« 

t • • t « 

— j «- i i - »- ~ i " t»- i-j i*" w-- j i ^ T77 • * a~ 7 a " 7 

r.-j „ f I'w L.U Ui aiA ig v A l Li l-U I a.-.w I ■ « au l uj I rtjl 

o . • uju A.Au Liw Aw.^ C-i lAl Auo i/-.w li-A i l(j a,*vj 1^.-. uwA t LA 

QKPFOCR I CMRNFS.^S> 



■ v 






1 50 




!70 






• 2^ 
1- J 




120 


1 






• 




• 






• 




• 




* * T 


now 


AGG CAC ATC 


Uoo 


■ >w W 




ACA 






; ^« — — — — 

r>»--u c« 1 111 




~ i I 

j.-jn 


* ^ 


i jg u i u i nu 


****** 


i uu 


CTC 


TGT 


r-- 


o * l 


i i G CCA AAA 


C K 


L 


T 


T H ! 


R 


T 


H 


T 




* 


" C * v 



200 210 220 230 240 

• i t • • 

CGG TOT CAC ATT TGT CGG AC-0 A-C TTT CCC ACG ACT CAT CAA CGG AAC 

CCC AC;. GTC TAA ACA CCG TCC TTC AAA CCC TCC TCA CTA CTT CGG TTC 

A C 0 ! C C R K F A R S 0 E R K> 







250 


250 




270 




280 

• 






CAT 


• 

ACC AAA ATC 


t 

CAT ACC 


CCC GAG 


• 

AAC CCC 


TAT 


rfj T(T rrr 
OwJ lliv* Uwl 


CTC 


TCC 


CTA 


TGC TTT TAG 


CTA TCC 


CCC CTC 


TTC CCC 


ATA 


CCA ACC CCA 


CAC 


R 


H 


T K I 


H T 


C E 


K P 


Y 


A C P 


v> 



290 300 310 20 330 



r * - 1 - *T* TTT T ""* r, T ^ 1 ^ "7 A'*" ^ m TAT 

U^LJ i ww low LiMi www www ill Iwi Low iwU Uttl UrtU w< I Aww wWw wA I 

wlw A(ju Auo Li A uu) LiwU aAA AuA bwAJ Alio wlA Liw UAA Ijy UlA 

E S C D R R F S R S D E L T R K> 



FIG.16A 



WO 98/54311 PCT/US98/10801 

19 / 22 



340 350 360 370 3!0 

• • a ■ ■ 

aic ccc at: cac aca ggc cag aac c:c ttc cac tgi cga ata ;gg atc 

TAG GGC TAG CTG TGI CCC CTG TTC GGG AAG GTG ACA CC7 TaT ACG TAG 
!RlHTC0KPFOCRICU> 



330 400 410 420 4 30 

• • « ■ * 

CGT AAC TTC ACT CCT ACT CAC CAC CTT ACC ACC CAC ATC CCC ACC CAC 
CCA TTG AAG TCA CCA TCA CTG CTG GAA TGG TGG CTG TAC CCC TCC CTG 
SNFSRS0HLT7HiSTH> 



440 450 
• ■ 

ACA GGC CAG AAG CG7 TTT GCC TCT 

TGT CCC CTC TTC CCA AAA CCC ACA 

T C : K F F A C 



460 470 430 

• * • 

GAC AT i iC i GCC AGu A-w i • " Gw« 

CTG TAA ACA CCC TCC TTC AAA CCC 

0 ! C C R K F A> 



4 90 500 510 520 

ACG ACT GAT GAA CCC AAG ACG CAT ACC AAA ATC CAT 77 A ACA CAG AAC 
7CG TCA C7A CTT GCG TTC TCC CTA TGG TTT TAG GTA AAT TCT CTC TTC 
RSDERKRHTKIHLRQK> 



530 540 

GAC TCT ACA ACT ACT 
CTG AGA TCT TGA TCA 

0 S R T S> 



FIG.16B 



9NSDOOD <WO 96&431 1 Ai _-_> 



WO 98/54311 



21/ 22 



PCT/US98/10801 




WO 98/54311 PCT7US98/ 10801 

22 / 22 





INTERNATIONAL SEARCH REPORT 



International application No 

PCTAJS98/ 10801 



A. CLASSIFICATION OF SUBJECT MATTER 
IPC(6) Please See Extra Sheet 

US CL Please See Extra Sheet. 
According to InternauooaJ Patent Classification (IPC) or to both national classification and IPC 

B. FIELDS SEARCHED 

Minimum documentation searched (classification system followed by classification symbols) 
U.S. ; 530/350. 400. 435/69.1, 252J, 320.1. 417, 514/6, 424/450; 536/23.5. 23.6. 2372 

Documentation searched other than minimum documentation to the ex lent that such documents are included in the fields searched 



Electronic data base consulted during the international search (name of data base and, where practicable, search terms used) 
APS and DIALOG files Denvern Biotechnology Abstracts. Current Bio Tech Abstracts. CA Search. Biosis and Medline 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Category* 



Citation of document, with mdi cation, where appropriate, of the relevant passages 



Relevant to claim No. 



X 
Y 



ROLLINS, M. B. et al. Role of TFIIIA Zinc Fingers In Vivo: 
Analysis of Single Finger Function in Developing Xenopus 
Embryos. Molecular and Cellular Biology. August 1993, Vol. 13, 
No. 8, pages 4776-4783, especially pages 4777-4783. 

QUIGLEY, C. A. et al. Complete Androgen Insensitivity Due to 
Deletion of Exon C of the Androgen Receptor Gene Highlights the 
Functional Importance of the Second Zinc Finger of the Androgen 
Receptor In Vivo. Molecular Endocrinology. July 1992, Vol. 6, No. 
7, pages 1103-1112, especially pages 1104-1108. 



1-5, 7, 9, 26, 28, 
29, 32, 41-43 and 
46 



1, 3, 5, 9, 19, 20, 
26, 27, 29 and 32 

21, 23, 24, 46-49, 
and 50 



Further documents are listed in the continuation of Box C. | j See patent family annex. 



Spec* J c*Wf on*i of cuad docuanantL 

document daftnanaj ln« g«Mr»i mm* of tha art wbich m not eorutdarod 
to b* of particular rata* anew 



*T* uuar oocuacrx publuhad after the lounnuoul filing data or pftoricy 

data and not » conflict wtth the application but euad 10 und*rMw>d 
tha pnneipi* or thaory urvdw-rv mt tha in* •noon 



•B- W b*r docuaiant pubhahod or. or aftar tha Hiu,, cUta " X " T^,* »"" jcut * r "t~»nc. *a claaaad mvmoon cannot b. 

coo>id*r*d no* at of cannot ba eoniidarad u> in* oh-e an invanuva it*p 
*L* docuinatM which may throw doubts on priority cisiaaO) or *h»ch m wh " docuaaant u takan aiona 
citad to aatablah tha publication data of tnothor citauon or othar 

•pacul rMaon <aa tpMifiod) * Y * dovmmtm of particular raWvanca. (ho ctaanad tn*anuoa cannot b« 

coaandarod to »»»oU« an invanuva iup whtn the docuvaM m 
'O* docuaiaaM rafamot to an oral ducloaurt, um. axhibibon or othar cow baa ad with ona or mora othar auch docvunanu. luch combination 
B»aana bamf obvtoua 10 a penon iktlUd ui (ha art 

*r docuaam pubtiaaad poor w tha aW«r«UonaJ Hlang dat- but laurr lh*n .4. doou-jont •aaabor of th« tama patant (antly 
tha prtorw data cUu»ad 


Dale of the actual completion of the international search 
26 AUGUST 1998 


Date of mailing of the international search report 

20 OCT 1998 


Name and mailing address of the ISA/US 
Commissioner of Piunts and Trademarks 
Box PCT 

Washington. D C 20231 
Facsimile No. (703) 305-3230 


Authorized officer / \ \fl-_ 

WILLIAM W MOORE />S 
Telephone No. (703) 308-0196 ^jt^ 



Form PCT/1SA/210 (second sheetXJuly 1992)* 

BNSDOCiC <WO 965*311 A i_< > 



INTERNATIONAL SEARCH REPORT 



International application No 

PCT/US9S/10801 



C (Conunuiuon). DOCUMENTS CONSIDERED TO BE RELEVANT 


Category - 


Citation of document, with indication, where appropriate, of the relevant passages 


Relevant to claim No 


X 
Y 


RAY, A. et al. Repressor to activator switch by mutations in the 
first Zn fineer of the glucocorticoid receotor Is direct DNA 
binding necessary? Proceedings of the National Academy of 
Sciences, U.S.A. August 1991, Vol. 88, No. 16, pages 7086-7090, 
especially pages 7087-7090. 


1, 2, 5, 8, 19 and 
20 

21, 22, 25, and 
36-39 


X,P 
Y,P 


US 5,578,483 A (EVANS et al.) 26 November 1996, cols. 6-18 
and Figures 1-4. 


1-3, 5, 9. 19, 20, 
26-29 32 and 47 

10, 17, 21-29, 32, 
33, and 36-40 


Y 


US 5,198,346 A (LADNER et al) 30 March 1993, cols. 4, 14-21, 
26-41, 51-55, 64-74, 77-89, 132-140, 145-153, and claims 1, 2, 5, 
24-28, 30, and 37-47. 


1-23, 25, 26. 28- 
44 and 46 


Y 


US 4,990,607 A (KATAGIRI et al) 05 February 1991, cols. 2-5 


18 and 34 


Y 


US 5,376,530 A (DE THE et al) 27 December 1994, cols. 5-18, 
and 21-27. 


17, 21-23, 25, 26, 
28, 29, 32 and 33 


Y 


BERGQVIST, A. et al. Loss of DNA-bindmg and new 
transcriptional trans-activation function in polyomavirus large T- 
antigen with mutation of zinc finger motif. Nucleic Acids 
Research. May 1990, Vol. 18. No 9, pages 2715-2720, especially 
pages 2717-2720. 


2, 5, 9, 11, 19, 
20-22, 25, and 
36-39 


Y 


RAUSCHER, F. J. et al. Binding of the Wilm's Tumor Locus Zinc 
Finger Protein to the EGR-1 Consensus Sequence. Science 30 
November 1990, Vol. 250, pages 1259-1261, see entire article. 

i 
. 


2. 4, 5, 7. 9, 10, 
19-21, 23-29, 32, 
33, 36-38, 40 and 
41 


Y 


JACOBS, G. H. Determination of the base recognition positions of 
zinc fingers from sequence analysis. The EMBO Journal 
December 1992, Vol 11, No. 12, pages 4507-45 1 7, especially 
pages 4508 and 4515 and Figure la. 


-7 


Y 


WRIGHT, J. J et al. Expression of a Zinc Finger Gene in HTLV- 
1- and HTLV-II-transformed cells. Science. May 1990, Vol 248, 
pages 588-591, see entire article 


12-14, 21, 23, 25, 
26, 28, 29, and 
36-38 


Y 


JULIAN. N et al Replacement of His 2J by Cys in a zjnc fineer of 


6. 12. 15, 16. 21, 



" - - » t >.- — - T -?t* W t-tr - v *'ttt *t T 



INTERNATIONAL SEARCH REPORT 



iDLCrD»UOO«l application No 

PCT/US9*/ 10801 



C (Continuation). DOCUMENTS CONSIDERED TO BE RELEVANT 



Category* 



Ciutjoo of document, with indication, where appropriate, of the relevant passage! 



SOUTH, T. L. et al. The Nucleocapsid Protein Isolated from HTV- 
1 Particles Binds Zinc and Forms Retroviral -Type Zinc Fingers. 
Biochemistry. November 1990, Vol, 29, No. 34, pages 7786-7789, 
especially pages 7787-7789. 

YU, M. et al. A hairpin ribozyme inhibits expression of diverse 
strains of human immunodeficiency virus type I. Proceedings of 
the National Academy of Sciences, U.S.A. July 1993, Vol. 90, 
pages 6340-6344, especially page 6343. 



DEBS, R. J. et al. Regulation of Gene Expression in Vivo by 
Liposome-mediated Delivery of a Purified Transcription Factor. 
The Journal of Biological Chemistry. 25 June 1990, Vol. 265, No. 
18, pages 10189-10192, especially pages 10190-10191. 

JAMIESON, A.C. et al. In Vitro Selection of Zinc Fingers with 
Altered DNA-Binding Specificity. Biochemistry. May 1994, Vol. 
33, No. 19, pages 5689-5695, see entire document. 

AGARWAL, A. et al. Stimulation of Transcript Elongation 
Requires both the Zinc Finger and RNA Polymerase II Binding 
Domains of Human TFIIS. Biochemistry. July 1991, Vol. 30, No. 
31, pages 7842-7851. 

THUKRAL, S. K. et al. Mutations in the Zinc Fingers of ADR1 
That Change the Specificity of DNA Binding and Transactivation. 
Molecular and Cellular Biology June 1992, Vol. 12, No. 6, pages 
2784-2792. 



Relevant to claim No. 



6, 8, 12, 15, 16, 
21, 25, 26, 28, 
30, 31, 35 and 
36-38 

3, 5, 9, 12, 15, 
16, 19, 20, 21, 
23, 25, 26, 28, 
29, 32, 36, 37 
and 40 

21-25 and 36-40 



8 and 31 



2, 22 and 39 



1-42 and 47 



Form PCT/ISA/210 (conunuauoo of second sheetXJuly 1992)* 



INTERNATIONAL SEARCH REPORT 



International tpplicauon No 

PCTUS98/10801 



1 



A. CLASSIFICATION OF SUBJECT MATTER 

IPC (6) CI2N 15/0 1, 15/1 1. 15/12. 15/33, 15/62. 15 70; C07K 14/00. 14 005. 14,435, 19 00. Aolk 38, lb. 38/17. C12Q 
1/02. 1/68. 1^0 

A CLASSIFICATION OF SUBJECT MATTER: 

US CL 530/350. 400. 435/69 1. 252.3. 320.1. 4 17; 514/6. 424/450; 536/23.5. 23.6. 23 72 



WO 93/S4J1 1 PCT/U898/1O80 1 

I / 22 




WO 9*54311 PCT/CS9&>10W1 



eg 



2 / 22 



<0 >. 

§ ft ft cc 

S iu LU la 

5 O CD CD 



O 5 . ^ 

CJ u_ U. ix. 




BNSOOOD <WC 96S*3ii*i': > 




9NSDCOD <WC 96&A3!i*iTi > 



WO 9S/&4311 



5 / 22 



PCT/U59iau80] 




WO 98/5431 1 



7 / 22 



povuswioaoi 



10 2D JO <Q 

* • • i 

CK Cti CCC TAT CCT FCC CCT GTC CAC TCC TCC CAT CGC CCC 77T TCI 
CAC CTC CGC A7A CCA ACC CCA CAC CTC A3G ACC CIA CCG CGC AAA 
LEPYACPVESCDRRES> 

5D 60 70 EC 90 

* * « i i 

GCC TCC CA7 C« CTT ADC CGC CAT ATC CCC AFC CAC ACA GCC CAC AAC 
tCC ACC CTA CTC CAA TCC CCC GTA TAG GCC 7 AC CTC TGT CCG CTC TTC 

* S D E L 7 R E J R ! E T C 0 ft> 

1 DO HO 120 130 14D 

> * » i * 

DCC TTC CAE TCi CCA ATA TCC AFC CGT AAC TTC ACT CCT AC! CAC ZK 

CCG AAC CTC .ACA CCT TAT ACC TAC GCA TTC AAC TCA CCA 7CA C7C CTC 

r F C C R i C u ft .H F S R S D *> 

150 IcO 170 1SD 190 

CTT ACC *£C CAC ATC CCC ACC CAC AW CCC CAC AAC CCT TT; CCC TC7 
CA.i. TCC TCC CTC TAC GCC TCC CTC TGT CCC CTC T7C GCA AAA, CCC AD. 
LTTrilRTETCEXPFAD 

20D 210 220 2 JO 2*0 

• • tip 

CAC ATT TCT CCG AGG AAC FTT CCC ACC «T GAT CAA CGC A# ACC CAT 

CTC TAA ATA CCC TCC TTC AAA CCC TCC TCA CTA CTT COG TTC TCC CTA 

DICCRKF A R S D E R X R [> 

250 260 270 

» » * 

ACC AM ATC CAT TTA ACA CAC AAC GAC ACT 
FGC ITT TAC GTA AAT TCT CTC TTC CTC TCA TCA 

T K t E L R 0 K I 1 £> 



F1G.7 



wo ww«n 



8 / 22 



PCTAJSWIUSOJ 



i — i r 



c - htlu 



WLElPrACPvESCDRRFSR.S D E L 7 R H f R I K 7 
C 0 K P T 0 t ft ] - ■ £ H K / 5 S S D K I T T H ] R 7 H T 

CtXPFACDI — CCAKTARS D g R K R H T k I H I 
SOU DSR7S7SCQAGOYPYDYPDYAS 

i — BecapTpTTE — 



FIG.8A 



FINER l 
HNCcR 3 



B S A - C c - 



C C 7 
G D A 



7 



C C C T C G C C c]c C C T 

CGCACCCGCJCGC 7 
— _ ' ' 1 




C I G 


i r c 


T C I 


' G k C 


A A C 


A C A 



HfV CfU 



RG.8B 



WO W/5431 1 PCT/USWlOSOl 

10 / 22 



Cells 

MW 1 : 




1 2 3 4 5 6 

FIG. 10 



11 / 22 




WO 9R/S43I1 



12 / 22 



FCT/O&W10»l 



o 




BNSCOCD <WC 9ftS*3i i*lT; > 



13 / 22 



PCTA'SS&'lttSOl 



10 



20 



30 



<0 



Arc cic cac cic ccc ui ccr tcc cct ctc cac tcc tcc cat ccc ccc 

TaC CAG CTC CAC CCC AlA CCA ACC CGA CAS CIC ACC ACC CTA GCG CCC 
«LfLPYACPV£SCD(?fi> 



50 



50 



70 



60 



90 



Hi 7CT KC TCC GAT CAC CTT ACC CCD CAT AlC CCC ATC CAC ACA CCC 
AAA AGA COC ACC CTA CTC CM TCC CCC CIA 7 AC CCC TAG CTC TCT CCC 
FSRSDELTRH[K[K7C> 



10D 



110 



12D 



130 



HO 



CAC AAC ZZZ TIC CAC TOT CCA ATA <CC ATC Lui AAC i IC AG I CCT ACT 

gtc nc cc: aac cic a:a ccr tat acc tac cca nc aa; tca cca ica 

OKProCR] CUXMF s«s> 



150 



150 



170 



ISO 



190 



CAT CAT CH ACC ACC CAC ATC CCC ACC CAC AC* KC CAC AAC CCT Hi* 
CTC CTG CM TCC TCC GTC TAG CCC TCC C TG TCI CCC CTC 7TC GCA MA 
D « I I I H I R T hf i G E K P F,. 



aw ".") iz> ::o 2« 

« < > a i 

CCC ICT CAC ATT TCT GCC ACC AAC TTT CCC ACC ACT CAT CAA CCC AAC 
CTC ACA CTC TM ATA CCC TCC TTC AW, CCC TCC TCA CTA CTT CCC TTC 
ACOiCCSxrARSOERK> 



250 



260 



270 



280 



ACC CAT ACC AAA ATC CAT ACC GC 7 CAC AAC CCC ACT ACT CCC CGI CCT 
TCC CTA TCC TTT TAG CTA TCC CCA CIC 7TC CCC TCA TCA CCC CCA CCA 
SHTK | HIGQKP75GGC> 



2?0 



J30 



310 



3?0 



333 



CCC ATC CCC CCC C IC GAG CAA AM GTC AAA ACC TTC M* GCG CAA AAC 
CCC TAG CGC CCC CAC CTC CTT TTT CAC TTT TCC AAC TTT CCC CTT TTC 
* 1 f A ?. I E E K V K I L K A Q K> 



FIG.13A 



WO 58/5431 1 



14 / 12 



PCTVUSWJCISOl 



340 530 360 270 J30 

* « « « < 

TOC CAC CTC CCC ICC ACC GCC AAC A7C CTC ACC CaA OC CTC CCA CAE 

«C CTC CAC CCC AGC TCC CGC TIC TAC GAC ICC CTT CIC CAC CCT CTC 

SE L AST AWJW.LRrov/*C?> 



290 400 410 420 430 

* III I 

CIT AAA CAG AAA CTC ATC AAC CAC GC7 ACC GCC CAC CCC GCC CaG TAC 
GAA IfT CTC TT7 CK TAC TIC CTC CCA TOC CCC G7C CGC CCC GTC ATC 
I KQKVMNHASCOAG Q ft 



440 «0 460 

I I 9 

CCS TAC C>: CTT CCC CAC TA: GCi TC7 TM 
CCC ATC CTC CAA CCC CTC A"C CCA ACA Ail 
PY&ypprAS ■> 

KCAPEPTJK r/£ 



FIG.13B 



WOU8«4JII PCI7US98/1WBI 

15 / 



'0 20 30 40 

• • $ , 

we C7c cac cjc ccc tat ccr tgc cct ctc gac tcc tcc cm ccc ccc 

TAC GaC CTC GAC GCC AT A CCA ACC CCA CAC C7C ACC ACC CU CCC CCC 
L ' L £ L P r A C P V E S C D R R> 

50 M 70 80 90 

TTT 7C7 CCC TCC CAT CAC CIT AX CCC CAT ATC OCC ATC CAC ACA CCC 
AAA ACA CCC ACC CTA CTC CAA TGC CCC GTA 7 AC CCG TAG G7C TCT CCC 
■ r S R S 0 E l T R H [ I? I H T ^ 

!K) Hft 120 130 KO 



I 



S *5 £ 7TC C ^ TCr ^ ATA TCC A7C aT TTC CC7 ACT 
CTC TTC CCG AAC CTC ACA CCT TAT ALL ?AC CCA TTC A*C TCA CCA 7CA 

GK ? "5CRICMJ!NFSRS> 

'50 160 170 180 190 

• • • 

c;: cac ci7 acc etc atc gc a:; cac aca gz gac a*; cct ttt 

CTC CTC CAA ICG TCC G7J TAG CCC TCC CTC TCT CCC CJC 7TC fiCA AAA 
■> * <■ < < H j .P T K T C E X P f> 

2W 2 TO 220 230 2<Q 

» * • , 

GCC TCT CAC ATT TCT CCG ACC AAC TTT CCC ACC ACT CAT CM CGC AAG 
CCC ACA CTB TAA ACA CCC 7CC TTC AAA CCC TCC 7CA CTA CTT GCC TTC 
A C 0 t C G R K f A S S £ E R K> 

250 260 270 230 

► « • i 

ACG CAT ACC AAA ATC CAT ACC CC7 CAC AAG CCC ACT AGT OCC CCT GCT 
TCC CTA TCG TT7 TAG CTA TCC CCA GTC TTC CCC TCA TCA CCC CCA CCA 
R HTKTHTGOKPTS ,G C C> 

23d 2O0 3 jo J20 330 

• 

CTC ACC CAC ACC CTG CAC CCC CAA ACC GAC C*C CTC CAA CAC CAA AAA 

CAC TCG C7-G TCC GAC CTC CCC CTT TCC CTC CTC CAC CTT CTC CTT TTT 

L ^ p 7 l 0 A g T D 0 i £ 0 r k> 



FIG.14A 



W0 98/5O1I pcT/uswioaoi 

16 / 22 



M SO 350 J70 3BO 

ICC CCS CTG CM ADC CAA ATC CCC MC CTC Cli*. UA juu fiA* 4 AT 
ACC CCC GAC G7T TGC CT7 TAG CCC fTC GAC GAC TTT CTT T77 C7T TTC 
s * I 0 IEIANILX£KEK> 



•590 «D AID QQ 4iO 

1 » » i i 

CTG CAS TTC A7C CTC CCC C-CA CAC GCT KC CCC CAE CCC Ctt CAC TiC 
GAC TCC C7C TGC CAC GTC CCC CTT TCC- CTC CTC CaC CH CTC CTT TFT 
I E f I L A A H A S C 0 A G 0 Y> 

<40 <iQ <ja 

» < i 

c:: ta: c.£ c-t ccs ca: tac cct tct tm 

G^C AT: CTC Oi CCC CTC ATC CCA ASA ATT 
PYDVPOfAS o 



c-;cap:pt!k tac 



FIG.14B 



8KSOOC1C <WO 966-431'AiTi > 



WO *8/3<31 1 PCTVUS Wl W01 

17 / 22 

!0 2D 3D 40 

i * r i 

ATC AAA C7C CTC CaC CCD 7aT OCT 7CC CCT GTC CAC TCC TCC CAr CCC 
7AC TIT CAC CAC CTC CSC ATA CCA ACC CCA CAC CTC ACG ACC CTA GCC 

**"KLL CPVACPv£ scor> 

50 60 ?0 SO 90 

OCC TH TCT AAG TCC DC i CAT C7C AAC CCC CAT AIC CGC ATC CAC ACT 
CCC AAA AC-A TTC ACC CCA CTA CAC mC CCC CTA TAG COG TAG GTC- TCA 
RFSJCSA[>lXffHjRlaT> 



Uj HO 120 UC 



GCC GAA A^ CCC 7al CTC Ttt r.n t^n 7r* rrr rrr 

P77 77 T — r ' " r f"^ r^i rif r» t .>* 

) ' I l a i Li Ljs. /v,j IwA L^L ^vg *u-u WW *v.- 

Ci KPrACPVESCDRRr> 

>£0 170 150 190 

« t I t i 

"cr a;.c 7cg c:r cat ctc ^ cc: cat atc clx atc cac azc uzz gag 

ACA 7TC A5C C3A C" A CAC T'C GCC CTA TAG GCC TAG CTC TGC CCC CTC 
SKSADLXPKJ«|HICE> 

200 2io no 2*q 

• 4 » 1 » 

AAC CCC TAT DCT ICC CCT CTC GX TCC TCC CAT CGC CCC TTT TCT .AAC 
i ,C 'juv A. A CCA ACC Cv^ C^C C'C A^C tCj C 7A GCC CDC AAA, ACA TTC 
KFTACPVESCBRRFSK> 

250 250 ;70 2S0 

• • * » 

TOG GCT CAT CTC AAC CGC CAT ATC CCC ATC CAC ACC GET CAG AAG CCC 
XX GGA CTA GAC TTC GCC CTA TAG CCC TAG GTC TCC CCA GTC TTC COG 

S A 0 L K R H [ R [ N T C 0 K P> 

230 
« 

ACT ACT 
TCA TCA 
T S> 



FIG. 15 



WOW543I1 PCT7U998/I0W1 

IS / 22 
10 20 30 ^ 

atg ctc cac ctc ccc txt ki tcc en ctc GA£ 7CC tcc cat ccc ccc 
tac cac ctc cac kg at a CGA acg cga cac ctc acc acc CIA ccc ccc 

Kt[lPrACPVESCO*R> 
SO 50 70 50 3C 

It < < ' 

FT! TCf EC TCC GAT C/C C"T ACC CCC CAf ATC CCC ATC CAt ACA CCC 
AAA AW, CCC ACC CTA CTC C-AA TCC GCC CTA TAG CCC TAo CTC TCT CCC 

100 liO 120 530 KO 

» I * • * 

AAC CCC T7C CAC TCT CCA ATA TCC A7C CGT AAC TTC AG7 COT ACT 

"C TTC CCG AAC CTC ACA CCT TAT ACC TAC CCA TTC AAC TCA CCA TCA 

OK?F OCRI CttffNf SRS> 

160 170 W '-3 

• • • ' 

CAC CAC CTT ACC ACC CAC ATC CCC ACC C'C ACA CCC G« AAC CCT T7T 

CTC CTC CAA TCC TCC CTC "AC CCS TCC CTC TCT CCC C7C TTC GCA .AAA. 

D H L i ( K I R T H T G t K P 

203 310 220 230 240 

, • ■ • • 

CCC TCT CAC ATT TCT CCC ACC *AG TT7 CCC .ACC ACT CAT CM CCC AAC 
CCC ACA C"C TAA ACA CCC TCC TTC AAA CCC TCC TCA CTA CT7 CCC TTC 
ACDIC"CRKFARSDE*K> 

250 JSC 270 260 

, t » • 

ACC CAT ACC AAA ATC CAT ACC CCC CAG AAC CCC TAT CCT TCC CCT G7C 

TCC CTA TGG TTT TAG GTA TCC CCC CTC TTC CCC ATA CCA ACC CCA CAC 

R H T K i K T C E * P Y A C P V> 

290 3W JIB 320 330 

. • » • • 

CAC TCC TCC CAT CCC CCC TTT TCT CCC TCC CAT CAC CTT ACC CCC CAT 
CTC AGE ACC CTA GCC GCC AM ACA CCC ACC CTA CTC CAA TCC GCC CTA 
ESC'DRRrSRSDELTftto 



FIG.16A 



BNS0OC1D <WC 96&*31 1A iTi _» 



WO 1 PCT/USM/IW01 

19 / 22 



M 350 3SD 370 389 

• » ♦ < , 

ATC CCC ATC CAC ADA CDC CAC AAC CCC TIC CAC TCT CGA AiA TCC ATG 
T*C CCC TAG GTG FGT CCC CTC TTC CCC AAC C7C AC* CC" T*T AX IAC 
ISlHTGOKPFQCSICto 

J90 400 410 4M 'JQ 

• * « « • 

CGT HC TTC AC7 CCT ACT CAC CAC CTT ACC ACC CAE ATC CGC ACC CAC 
CCA TTC AAC TCA CCA TCA CTC CTC CAA IOC TCG CTC TAC COC TCC CTC 
R N r S R 5 0 H L T T « | R 7 H> 

440 450 460 470 460 

!■ • * ♦ 

ACA EC CAC AAC CCT 77 T CCC TCT GAC ATT 7C7 CCC AX A AC TTT CCC 
TCT CCC CTC TTC GGA AAA C3C ACA CTC TAA ACA CCC TCC TTC AAA CCC 
TCfKPFAC9]CCRK< : 0 

'30 50D J10 ^ 

j t * i 

ACC ACT CAT CM CCC AAS AX CAT ACC AAA ATC CAT TTA AC a CAC M 
TCC ICA CTA CTT CCG TTC TCC CTA TCI TTT Ttf CTA AAT TC" CTC TTC 
RSDrRXRHTK[HLPOX> 

SJO 540 
« * 

CAC TCI ASA ACT ACT 

CTC AGA TCT TCA TCA 

0 S R 7 S> 



RG.16B 




BNSDOCiD <WC 965*3' i*iT , 



21 / 22 




WO 98/54311 



4*4 



PCT/US9fi/tOB01 





BNSDOC:C <WC 9&&4Ji"'A ,T ' > 



